← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Universal and Perfect Hashing

Research Depth 97 in the knowledge graph ☐ I know this ☆ Set as goal

5topics build on this

561prerequisites beneath it

See this on the map →

Hash Function Design: Properties and Requirements Hash Tables +2 more→→Bloom Filters Derandomization Techniques +2 more

Core Idea

Universal hashing and perfect hashing provide rigorous, provable guarantees for hash-based data structures. A universal hash family is a collection of hash functions where the probability of any two distinct keys colliding is at most 1/m (for m buckets) when the function is chosen randomly from the family — eliminating adversarial worst cases without knowing the input distribution. Perfect hashing goes further: given a static set of n keys, it constructs a hash function with zero collisions, achieving O(1) worst-case lookup in O(n) space. The FKS (Fredman-Komlós-Szemerédi) scheme achieves this by using two levels of universal hashing, with the second level sized quadratically to guarantee no collisions at each bucket.

Explainer

You already know that hash tables achieve O(1) average-case operations under the assumption that the hash function distributes keys uniformly. But this assumption is fragile: for any fixed hash function, an adversary can choose keys that all collide, degrading to O(n) per operation. Universal hashing eliminates this vulnerability by randomizing the choice of hash function. A universal hash family guarantees that for any pair of distinct keys, the collision probability is at most 1/m — the same guarantee a truly random function would provide, but using only O(1) parameters to specify the function.

The classic construction is the Carter-Wegman family: h(x) = ((ax + b) mod p) mod m, where p is a prime larger than the universe, and a, b are chosen randomly. For any two distinct keys x and y, the values (ax + b) mod p and (ay + b) mod p are uniformly distributed and independent, so the collision probability after the final mod m is at most 1/m. This elegant construction shows that pairwise independence suffices for the universal hashing guarantee. Stronger notions — k-wise independence, almost-universality — provide tighter concentration bounds at the cost of more complex hash functions.

Perfect hashing, achieved by the FKS scheme, goes beyond probabilistic guarantees to deterministic O(1) worst-case lookup. The construction uses two levels. The first level hashes n keys into n buckets using a universal hash function. The second level resolves collisions: for each bucket containing n_i keys, it constructs a second-level hash table of size O(n_i²) with a universal hash function chosen to have zero collisions. Quadratic space at each bucket guarantees collision freedom (the birthday paradox in reverse: with n_i keys and n_i² slots, a random function has no collisions with constant probability). The total space remains O(n) because the universal first-level hash ensures the sum of n_i² is O(n) in expectation.

The theoretical significance extends beyond hash tables. Universal hashing is a foundational derandomization concept: it shows that limited randomness (pairwise independence, specified by O(log n) random bits) suffices for many applications that seem to require full randomness. This principle recurs throughout algorithm design — in streaming algorithms, sketching, and load balancing — wherever probabilistic guarantees with small random seed size are needed.

Practice Questions 4 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Boolean Algebra and Fundamental Laws → Logic Gates Fundamentals → Implementing Boolean Functions with Gates → Karnaugh Map Simplification → Combinational Circuit Design → Flip-Flops and Latches → Finite State Machines (FSMs) → Deterministic Finite Automata (DFA) → Nondeterministic Finite Automata (NFA) → Two-Way Finite Automata → NFA to DFA Conversion (Subset Construction) → DFA Properties and Minimization Algorithms → Regular Languages: Definition and Characterization → Context-Free Grammars (CFGs) → Pushdown Automata (PDA) → Equivalence of CFGs and Pushdown Automata → Closure Properties of Context-Free Languages → Limitations of Context-Free Languages → Pumping Lemma for Context-Free Languages → Turing Machines → Variants of Turing Machines and Equivalence → Nondeterministic Time Complexity and NP → The P vs. NP Problem → Complexity Class P: Polynomial Time → Randomized Algorithms → Universal and Perfect Hashing

Longest path: 98 steps · 561 total prerequisite topics

Prerequisites (4)

Hash Tableshard Hash Function Design: Properties and Requirementshard Randomized Algorithmshard Hash Tables: Collision Resolution by Chainingsoft

Leads To (4)

Bloom Filtershard Derandomization Techniquessoft Sketching Data Structuressoft Streaming Algorithmshard