As I continue my study of algorithms and data structures using [Wayne and Sedgewick Algorithms] text, I figured it was time to write some notes about symbol tables.
Symbol tables are a useful ADT for performing quick inserts and lookups given a key-value pair.
Fundamental behaviors of a symbol table:
- a value is inserted with a specified key; a key is unique (no duplicates)
- furthermore, when a client puts a key-value pair into a symbol table that contains the key, it updates the value instead of adding a new entry
- given a key, search for a corresponding value
- a key cannot be
null
The API for a symbol table generally has the following methods:
void put(Key k, Value v) // put k-v pair into the table (a[k] = v)
Value get(Key k) // value paired with key (a[k])
void delete(Key k) // remove key (and value) from table
boolean contains(Key k) // is there a value pareid with key?
boolean isEmpty() // is the table empty?
int size() // number of k-v pairs in the table
Iterable<Key> keys() // all keys in the table
While Java does not offer the a[k] = v
syntax for put
or a[k]
for get
, these expressions eloquently display the idea behind how they work.
The symbol table ADT can be implemented with a linked list, array, or BST as its underlying data structure. While the first two examples are not viable options to implement a symbol table, it’s helpful to analyze these implementations to get a firmer grasp of why linked lists and ordered arrays can fall short as the underlying data structures.
Sequential Search Symbol Table (linked list implementation)
public class SymbolTable<Key, Value> {
private Node first;
private class Node {
Key k;
Value v;
Node next;
public Node(Key k, Value v, Node next) {
this.k = k;
this.v = v;
this.next = next;
}
}
// search for key, return associated value
public Value get(Key k) {
for (Node x = first; x != null; x = x.next)
if (k.equals(x.k)) return x.v;
// search miss
return null;
}
// search for key. update value if found. grow table if new
public void put(Key k, Value v) {
for (Node x = first; x != null; x = x.next) {
// search hit. update
if (k.equals(x.k)) {
x.v = v;
return;
}
}
// search miss: add new node
// ("old" first is now "next" when key is new)
first = new Node(k, v, first);
}
}
A linked list symbol table will perform poorly for large datasets. put
and get
will run in linear time in the worst case, as both methods require traversing the list to perform their respective operations.
Binary Search Symbol Table (ordered array implementation)
Another way to go about implementing a symbol table is with an ordered array. The reason the array is ordered is to utilize binary search for insertions and retrievals.
public class SymbolTable extends Comparable<Key, Val> {
private Key[] keys;
private Val[] vals;
private int N;
public BinarySearchST(int capacity) {
// also would use array resizing code
keys = (Key[]) new Comparable[capacity];
vals = (Value[]) new Object[capacity];
}
public int size() { return N; }
public Value get(Key key) {
if (isEmpty()) reutrn null;
int i = rank(key);
if (i < N && keys[i].compareTo(key) == 0) return vals[i];
else return null;
}
// recursive binary search. returns index of key in table
public int rank(Key key, int lo, int hi) {
if (hi < lo) return lo;
int mid = lo + (hi - lo) / 2;
int cmp = key.compareTo(keys[mid]);
// search key less than mid value (search left half of subarray)
if (cmp < 0) return rank(key, lo, mid - 1);
// search key greater than mid value (search right half of subarray)
else if (cmp > 0) return rank(key, mid + 1, hi);
// key in middle and search key are equal
else return mid;
}
// search for key. update value if found. grow table if new
public void put(Key key, Value val) {
int i = rank(key);
// update
if (i < N && keys[i].compareTo(k) == 0) {
vals[i] = val;
return;
}
// move items over in parallel arrays to make room for new entry
for (int j = N; j > i; j--) {
keys[j] = keys[j-1];
vals[j] = vals[j-1];
}
// perform insertions in the parallel arrays
keys[i] = key;
vals[i] = val;
N++;
}
}
Note: for the sake of simplicity, I left out some beneficial overloaded constructor methods and array resizing code.
put()
in this case is O(n)
, as items have to be shifted in the parallel arrays to make room for new entries. get()
is O(log n)
(the same worst-case time complexity as binary search).
Because rank()
is simply a binary search, key classes are required to extend Java’s Comparable
interface.
Unfortunately inserts take too much time to make this a viable implementation for the symbol table.
BST Implementation
The BST-symbol table does not look much different than the generic BST example in my previous blog post. search
has been modified to accept a key and return a value as opposed to a boolean
if the value is present, and is renamed to get()
.
public class BST<Key extends Comparable<Key>, Value> {
// root of BST
private Node root;
private class Node {
// key
private Key key;
// associated value
private Value val;
// links to subtrees
private Node left, right;
// nodes in subtree rooted here
private int N;
public Node(Key key, Value val, int N) {
this.key = key;
this.val = val;
this.N = N;
}
}
public int size() { return size(root); }
private int size(Node x) {
if (x == null) return 0;
else return x.N;
}
public Value get(Key key) { return get(root, key); }
// return value associated with key in the subtree rooted at x;
// return null if key not present in subtree rooted at x
private Value get(Node x, Key key) {
if (x == null) return null;
int cmp = key.compareTo(x.key);
if (cmp < 0) return get(x.left, key);
else if (cmp > 0) return get(x.right, key);
else return x.val;
}
// search for key. update value if found; grow table if new.
public void put(Key key, Value val) {
root = put(root, key, val);
}
// add k:v pair or change key's value to `val` if key in subtree rooted at x.
private Node put(Node x, Key key, Value val) {
// add new k:v pair
if (x == null) return new Node(key, val , 1);
int cmp = key.compareTo(x.key);
// recursively call `put` on the left or right subtree
if (cmp < 0) x.left = put(x.left, key, val);
else if (cmp > 0) x.right = put(x.right, key, val);
// otherwise update the current value
else x.val = val;
// records the size of subtree for each node, for public `size` method
x.N = size(x.left) + size(x.right) + 1;
return x;
}
}