A count-min sketch uses d = log(1/delta) hash functions and w = ceil(e/epsilon) counters per row. When querying frequency of item x, it returns the MINIMUM of d counter values. Why the minimum, not the average?
AThe minimum is faster to compute than the average
BAll d counters are unbiased estimators that can only OVERCOUNT (due to hash collisions adding to the count), so the minimum is closest to the true frequency — averaging would preserve the positive bias from collisions
CThe minimum provides a lower bound while the average provides an upper bound
DThe minimum uses less memory than storing all d values for averaging
Each counter tracks the true frequency of x PLUS the frequencies of all other items that hash to the same position — it can only overestimate, never underestimate. Each counter independently has expected overcount at most epsilon * ||f||_1. Taking the minimum of d independent overcounts reduces the probability that ALL d counters have large overcount. By the union bound, the probability that the minimum exceeds the true frequency by more than epsilon * ||f||_1 is at most (1 - 1/e)^d ≈ delta for d = log(1/delta). The minimum exploits the one-sided error structure; averaging would retain the bias.
Question 2 Short Answer
HyperLogLog estimates the number of distinct elements in a stream. It works by hashing each element and tracking the maximum number of leading zeros observed. Explain why this works and why multiple registers are needed.
Think about your answer, then reveal below.
Model answer: If elements are hashed to uniformly random binary strings, the probability of seeing k leading zeros is 2^(-k-1). Among n distinct elements, the expected maximum number of leading zeros is approximately log_2(n). So the maximum leading-zero count is a rough estimator of log_2(n), giving a cardinality estimate of 2^(max_zeros). However, a single register has high variance — the estimate can be off by a constant factor. HyperLogLog uses m = 2^p registers, partitioning elements by their first p hash bits and tracking the max leading zeros in each partition. The harmonic mean of the per-register estimates reduces variance by a factor of sqrt(m), achieving standard error approximately 1.04/sqrt(m). With m = 1024 registers (~5 bits each ≈ 640 bytes), the standard error is about 3.25%.
The key insight is that a single max-leading-zeros register is an exponential-scale estimator with multiplicative noise. Averaging many independent such estimators (via stochastic averaging / partitioning) reduces the noise. The harmonic mean is used instead of the arithmetic mean because it handles the heavy-tailed distribution of 2^(max_zeros) better.
Question 3 True / False
The Alon-Matias-Szegedy result showed that computing the second frequency moment F_2 exactly requires Omega(n) space in the streaming model, but it can be (1+epsilon)-approximated in O(1/epsilon^2 * log n) space.
TTrue
FFalse
Answer: True
F_2 = sum(f_i^2) measures the 'skewness' of the frequency distribution. The AMS sketch maintains a counter that, for each stream element, adds or subtracts 1 based on a random 4-wise independent hash function. The square of this counter is an unbiased estimator of F_2. The variance is controlled by the 4-wise independence, and median-of-means reduces the failure probability. The O(1/epsilon^2 * log n) space bound is essentially optimal (up to log factors) by communication complexity lower bounds. This result launched the streaming algorithms field and won the Gödel Prize.
Question 4 True / False
Count-min sketches are mergeable: the sketch of a combined stream equals the entry-wise sum of the individual sketches. This makes them suitable for distributed and parallel settings.
TTrue
FFalse
Answer: True
If two streams S1 and S2 are summarized by count-min sketches CMS1 and CMS2 using the same hash functions, then CMS1 + CMS2 (entry-wise addition) is exactly the count-min sketch of the concatenated stream S1 || S2. This mergeability property means you can sketch data at distributed nodes, transmit the compact sketches to a central coordinator, sum them, and get the same result as if you had streamed all data through a single sketch. This property holds because the sketch is a linear function of the frequency vector — and linearity is what makes merging exact.