Questions — Sketching Data Structures

Question 1 Multiple Choice

The count-min sketch maintains a d-by-w matrix of counters. When an item x arrives with frequency update, each counter in row i is incremented at position h_i(x). Why is sketch merging (element-wise addition of two sketches) exact, and how does this enable distributed processing?

AMerging is approximate; it introduces additional errors from the two sketches

BThe count-min sketch is a linear function of the frequency vector — sketch(f1 + f2) = sketch(f1) + sketch(f2) exactly because each counter independently sums the frequencies of items hashing to its position

CMerging is exact only if the two sketches use the same random hash functions, but this is impractical for distributed systems

DMerging requires re-hashing all items, which defeats the purpose of sketching

Question 2 True / False

The Johnson-Lindenstrauss lemma states that random projections preserve all pairwise distances in a set of n points to within a (1 ± epsilon) multiplicative factor, using only O(log(n) / epsilon^2) dimensions. This enables approximate nearest-neighbor search in high-dimensional data.

TTrue

FFalse

Question 3 Short Answer

Min-hashing estimates the Jaccard similarity between two sets A and B. The method: choose k hash functions, for each set compute the minimum hash value across all elements, then estimate Jaccard similarity as the fraction of hash functions on which the two sets agree. Why does this work?

Think about your answer, then reveal below.

Question 4 True / False

Sketches are mergeable (can be added/combined) because they are linear functions of the input. This property enables one-shot distributed algorithms and streaming updates that would be impossible with non-linear summaries.

TTrue

FFalse

Questions: Sketching Data Structures