As I continue my study of algorithms and data structures using [Wayne and Sedgewick Algorithms] text, I figured it was time to write some notes about symbol tables.

Symbol tables are a useful ADT for performing quick inserts and lookups given a key-value pair.

Fundamental behaviors of a symbol table:

  • a value is inserted with a specified key; a key is unique (no duplicates)
  • furthermore, when a client puts a key-value pair into a symbol table that contains the key, it updates the value instead of adding a new entry
  • given a key, search for a corresponding value
  • a key cannot be null

The API for a symbol table generally has the following methods:

    void put(Key k, Value v)    // put k-v pair into the table (a[k] = v)
    Value get(Key k)            // value paired with key (a[k])
    void delete(Key k)          // remove key (and value) from table
    boolean contains(Key k)     // is there a value pareid with key?
    boolean isEmpty()           // is the table empty?
    int size()                  // number of k-v pairs in the table
    Iterable<Key> keys()        // all keys in the table

While Java does not offer the a[k] = v syntax for put or a[k] for get, these expressions eloquently display the idea behind how they work.

The symbol table ADT can be implemented with a linked list, array, or BST as its underlying data structure. While the first two examples are not viable options to implement a symbol table, it’s helpful to analyze these implementations to get a firmer grasp of why linked lists and ordered arrays can fall short as the underlying data structures.

Sequential Search Symbol Table (linked list implementation)

public class SymbolTable<Key, Value> {
    private Node first;

    private class Node {
        Key k;
        Value v;
        Node next;

        public Node(Key k, Value v, Node next) {
            this.k = k;
            this.v = v;
            this.next = next;
        }
    }

    // search for key, return associated value
    public Value get(Key k) {
        for (Node x = first; x != null; x = x.next) 
            if (k.equals(x.k)) return x.v;
        // search miss
        return null;
    }

    // search for key. update value if found. grow table if new
    public void put(Key k, Value v) {
        for (Node x = first; x != null; x = x.next) {
            // search hit. update
            if (k.equals(x.k)) {
                x.v = v; 
                return;
            }
        }

        // search miss: add new node
        // ("old" first is now "next" when key is new)
        first = new Node(k, v, first);
    }
}

A linked list symbol table will perform poorly for large datasets. put and get will run in linear time in the worst case, as both methods require traversing the list to perform their respective operations.

Binary Search Symbol Table (ordered array implementation)

Another way to go about implementing a symbol table is with an ordered array. The reason the array is ordered is to utilize binary search for insertions and retrievals.

public class SymbolTable extends Comparable<Key, Val> {
    private Key[] keys;
    private Val[] vals;
    private int N;

    public  BinarySearchST(int capacity) {
        // also would use array resizing code
        keys = (Key[]) new Comparable[capacity];
        vals = (Value[]) new Object[capacity];
    }

    public int size() { return N; }

    public Value get(Key key) {
        if (isEmpty()) reutrn null;

        int i = rank(key);

        if (i < N && keys[i].compareTo(key) == 0) return vals[i];
        else return null;
    }

    // recursive binary search. returns index of key in table
    public int rank(Key key, int lo, int hi) {
        if (hi < lo) return lo;

        int mid = lo + (hi - lo) / 2;
        int cmp = key.compareTo(keys[mid]);

        // search key less than mid value (search left half of subarray)
        if (cmp < 0) return rank(key, lo, mid - 1);

        // search key greater than mid value (search right half of subarray)
        else if (cmp > 0) return rank(key, mid + 1, hi);

        // key in middle and search key are equal
        else return mid;
    }

    // search for key. update value if found. grow table if new
    public void put(Key key, Value val) {
        int i = rank(key);

        // update
        if (i < N && keys[i].compareTo(k) == 0) {
            vals[i] = val;
            return;
        }

        // move items over in parallel arrays to make room for new entry
        for (int j = N; j > i; j--) {
            keys[j] = keys[j-1];
            vals[j] = vals[j-1];
        }

        // perform insertions in the parallel arrays
        keys[i] = key;
        vals[i] = val;
        N++;
    }
}

Note: for the sake of simplicity, I left out some beneficial overloaded constructor methods and array resizing code.

put() in this case is O(n), as items have to be shifted in the parallel arrays to make room for new entries. get() is O(log n) (the same worst-case time complexity as binary search).

Because rank() is simply a binary search, key classes are required to extend Java’s Comparable interface.

Unfortunately inserts take too much time to make this a viable implementation for the symbol table.

BST Implementation

The BST-symbol table does not look much different than the generic BST example in my previous blog post. search has been modified to accept a key and return a value as opposed to a boolean if the value is present, and is renamed to get().

public class BST<Key extends Comparable<Key>, Value> {
    // root of BST
    private Node root;

    private class Node {
        // key
        private Key key;
        // associated value
        private Value val;
        // links to subtrees
        private Node left, right;
        // nodes in subtree rooted here
        private int N;

        public Node(Key key, Value val, int N) {
            this.key = key;
            this.val = val;
            this.N = N;
        }
    }

    public int size() { return size(root); }

    private int size(Node x) {
        if (x == null) return 0;
        else return x.N;
    }

    public Value get(Key key) { return get(root, key); }

    // return value associated with key in the subtree rooted at x; 
    // return null if key not present in subtree rooted at x
    private Value get(Node x, Key key) {
        if (x == null) return null;

        int cmp = key.compareTo(x.key);

        if (cmp < 0) return get(x.left, key);
        else if (cmp > 0) return get(x.right, key);
        else return x.val;
    }

    // search for key. update value if found; grow table if new.
    public void put(Key key, Value val) {
        root = put(root, key, val);
    }

    // add k:v pair or change key's value to `val` if key in subtree rooted at x.
    private Node put(Node x, Key key, Value val) {
        // add new k:v pair
        if (x == null) return new Node(key, val , 1);

        int cmp = key.compareTo(x.key);

        // recursively call `put` on the left or right subtree
        if (cmp < 0) x.left = put(x.left, key, val);
        else if (cmp > 0) x.right = put(x.right, key, val);
        // otherwise update the current value
        else x.val = val;

        // records the size of subtree for each node, for public `size` method
        x.N = size(x.left) + size(x.right) + 1;

        return x;
    }
}

Resources Used