A universal hash family H mapping universe U to {0,...,m-1} guarantees that for any two distinct keys x, y in U: Pr_{h in H}[h(x) = h(y)] <= 1/m. Why is this strictly stronger than assuming a 'random' hash function?
AIt is not stronger — a truly random function automatically satisfies the universal property
BA truly random function from U to {0,...,m-1} requires O(|U| log m) bits to store, which is impractical, while a universal family can be specified with O(log |U|) bits per function — universality achieves the collision bound with compact representation
CUniversal families guarantee no collisions, while random functions allow collisions
DUniversal hashing works only for integers, while random hashing works for all data types
A truly random hash function assigns each key an independent random value, requiring |U| * log(m) bits — for a 64-bit key universe, this is ~2^64 entries, clearly impractical. Universal hash families like h(x) = ((ax + b) mod p) mod m achieve the same collision probability bound 1/m using only O(1) parameters (a, b). The point is achieving the probabilistic guarantee of random hashing with a compact, efficiently computable function. This is why universality is defined as a property of a FAMILY — the randomness comes from choosing which family member to use.
Question 2 Short Answer
In the FKS perfect hashing scheme, the second-level hash tables use space quadratic in the number of keys hashing to each bucket. Despite this, total space is O(n). Why?
Think about your answer, then reveal below.
Model answer: If n_i keys hash to bucket i at the first level (with m = n buckets), the second-level table for bucket i uses O(n_i^2) space to guarantee no collisions. The total space is sum of n_i^2 over all buckets. By the universal hashing guarantee, the expected number of collisions at the first level is at most n(n-1)/(2m) = (n-1)/2 for m = n. The expected value of sum(n_i^2) = n + 2*(expected collisions) = n + n - 1 = O(n). So the total second-level space is O(n) in expectation. If a random first-level function gives sum(n_i^2) > cn for some constant c, simply re-pick the first-level function — a constant fraction of choices succeed, so expected O(1) trials suffice.
This is a beautiful application of the birthday paradox in reverse: quadratic space at each bucket eliminates collisions at the second level (birthday paradox says O(sqrt(n_i^2)) = O(n_i) keys avoid collisions in n_i^2 slots), while the first-level universal hashing ensures the sum of squares stays linear.
Question 3 True / False
A 2-universal hash family is sufficient to guarantee O(1) expected lookup time in a chaining hash table, but k-wise independence for larger k provides stronger concentration guarantees.
TTrue
FFalse
Answer: True
With 2-universality, the expected chain length at any bucket is at most n/m + 1, giving O(1) expected lookup for m = Theta(n). However, the chain length variance could be high with only pairwise independence. With k-wise independence (k >= c*log n), you can apply Chernoff-like bounds to show the maximum chain length is O(log n / log log n) with high probability, matching fully random hashing. Higher independence costs more to evaluate — O(k) time per hash — creating a tradeoff between hash function cost and tail behavior of the load distribution.
Question 4 True / False
Perfect hashing achieves O(1) worst-case lookup for static sets. It cannot handle insertions without rebuilding the entire structure.
TTrue
FFalse
Answer: True
Classical FKS perfect hashing is designed for static key sets — the hash function is constructed based on knowing all n keys in advance. Inserting a new key could create a collision at the second level, requiring reconstruction of that bucket's hash function or potentially the entire structure. Dynamic perfect hashing (Dietzfelbinger et al.) extends this to handle insertions and deletions with O(1) expected amortized time per operation, but it requires periodic rebuilds when the load factor changes significantly. The static vs. dynamic distinction is important: static perfect hashing is simpler and has worst-case O(1) lookup; dynamic versions trade deterministic guarantees for amortized ones.