Questions — Streaming Algorithms — Open Knowledge Graph

Question 1 Multiple Choice

A count-min sketch uses d = log(1/delta) hash functions and w = ceil(e/epsilon) counters per row. When querying frequency of item x, it returns the MINIMUM of d counter values. Why the minimum, not the average?

AThe minimum is faster to compute than the average

BAll d counters are unbiased estimators that can only OVERCOUNT (due to hash collisions adding to the count), so the minimum is closest to the true frequency — averaging would preserve the positive bias from collisions

CThe minimum provides a lower bound while the average provides an upper bound

DThe minimum uses less memory than storing all d values for averaging

Question 2 Short Answer

HyperLogLog estimates the number of distinct elements in a stream. It works by hashing each element and tracking the maximum number of leading zeros observed. Explain why this works and why multiple registers are needed.

Think about your answer, then reveal below.

Question 3 True / False

The Alon-Matias-Szegedy result showed that computing the second frequency moment F_2 exactly requires Omega(n) space in the streaming model, but it can be (1+epsilon)-approximated in O(1/epsilon^2 * log n) space.

TTrue

FFalse

Question 4 True / False

Count-min sketches are mergeable: the sketch of a combined stream equals the entry-wise sum of the individual sketches. This makes them suitable for distributed and parallel settings.

TTrue

FFalse

Questions: Streaming Algorithms