← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Bloom Filters: Space-Efficient Probabilistic Set Membership

Graduate Depth 88 in the knowledge graph ☐ I know this ☆ Set as goal

3topics build on this

471prerequisites beneath it

See this on the map →

Hash Tables Hash Function Design: Properties and Requirements +1 more→→Bloom Filters

Core Idea

A Bloom filter uses a bit array and k independent hash functions. To insert, set k bits; to test membership, check if all k bits are set. False positives are possible (k bits set by other elements) but false negatives are not. Space is O(n) bits regardless of element size.

How It's Best Learned

Implement a Bloom filter, measure false positive rates with different k and table sizes, and use it for a practical problem (e.g., checking if a URL has been visited). Compare space to a hash set.

Common Misconceptions

Thinking Bloom filters have no false positives; they only guarantee false negatives are impossible.
Not choosing k optimally; k ≈ (m/n) ln 2 minimizes false positives for m bits and n elements.
Assuming Bloom filters are slower than hash tables; they're often faster due to better cache locality.

Explainer

From your work with hash tables, you know that hashing maps elements to array positions for fast lookup. A hash set can tell you whether an element is a member of a collection in O(1) expected time, but it stores the actual elements — which can be expensive when elements are large (URLs, file paths, cryptographic keys) or when the set contains millions of entries. A Bloom filter trades perfect accuracy for dramatic space savings: instead of storing the elements themselves, it stores only a compact bit array that encodes a probabilistic summary of the set's contents.

Here is how it works. Start with a bit array of m bits, all initialized to 0, and choose k independent hash functions, each of which maps an element to a position in the array. To insert an element, compute all k hash values and set the corresponding k bits to 1. To query whether an element is in the set, compute the same k hashes and check whether all k bits are 1. If any bit is 0, the element was definitely never inserted — you get a guaranteed no. But if all k bits are 1, it might be a true member, or it might be that other insertions coincidentally set those same k bits. This is a false positive. Crucially, false negatives are impossible: if you inserted an element, its k bits are set forever (bits are never cleared back to 0 in a standard Bloom filter).

The false positive rate depends on three parameters: the bit array size m, the number of elements inserted n, and the number of hash functions k. As the array fills up with 1-bits, the probability that a random query finds all k bits set by coincidence increases. The optimal number of hash functions — the one that minimizes false positives — is k = (m/n) · ln 2, which comes from balancing two competing effects: more hash functions means more bits are checked (reducing coincidence), but also more bits are set per insertion (filling the array faster). In practice, with 10 bits per element and 7 hash functions, the false positive rate is about 1%.

The practical applications are everywhere. Web browsers use Bloom filters to check URLs against a list of known malicious sites — the filter fits in memory while the full list would not. Databases use them to avoid expensive disk reads: before searching a file on disk for a key, check the Bloom filter in memory; if it says "no," skip the disk I/O entirely. Distributed systems use them to synchronize data between nodes without transferring entire sets. The pattern is always the same: you have a large set, you need fast membership checks, and you can tolerate a small fraction of false positives because the cost of a false positive (an extra disk read, a redundant network request) is low compared to the cost of storing the full set in memory.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Conditional Statements → Defining and Calling Functions → Functions: Decomposing Problems → Function Parameters and Argument Passing → Return Values → Variable Scope → Introduction to Classes → Objects and Instances → Methods and Attributes → Algorithm Design Basics → Tree Structure and Node Properties → Binary Trees → Binary Tree Properties: Height, Balance, Completeness → Amortized Analysis → Hash Tables → Hash Function Design: Properties and Requirements → Bloom Filters: Space-Efficient Probabilistic Set Membership

Longest path: 89 steps · 471 total prerequisite topics

Prerequisites (3)

Hash Tableshard Hash Function Design: Properties and Requirementssoft Probability Axiomssoft

Leads To (1)

Bloom Filtershard