Reservoir sampling maintains a sample of size k from a stream. When the i-th element arrives (i > k), it replaces a random element in the reservoir with probability k/i. After n elements, each element is in the reservoir with probability exactly k/n. What makes this non-obvious?
AThe algorithm requires knowing n in advance to set the replacement probability
BThe probability that an element stays involves a telescoping product: it must survive all subsequent replacement attempts, and this product must equal k/n despite the algorithm never knowing n
CThe algorithm requires O(n) random bits to achieve uniform sampling
DEach element's inclusion probability depends on the other elements in the stream
Element j (j <= k) starts in the reservoir and survives round i > k with probability 1 - (k/i)(1/k) = (i-1)/i. The probability it remains through all rounds is (k/(k+1)) * ((k+1)/(k+2)) * ... * ((n-1)/n) = k/n — a telescoping product. Element j (j > k) enters with probability k/j and then survives with the same telescoping product (j-1)/j * ... * (n-1)/n = (j-1)/(n-1) — wait, more carefully: enters with k/j, then survives each subsequent round i with (i-1)/i, giving k/j * j/(j+1) * ... * (n-1)/n = k/n. The algorithm achieves perfect uniformity without knowing n, which is the key insight.
Question 2 True / False
In importance sampling, you draw samples from a proposal distribution q(x) instead of the target distribution p(x), and reweight by p(x)/q(x). This reweighting always produces an unbiased estimator of E_p[f(x)].
TTrue
FFalse
Answer: True
E_q[f(x) * p(x)/q(x)] = integral of f(x) * p(x)/q(x) * q(x) dx = integral of f(x) * p(x) dx = E_p[f(x)]. The reweighting exactly cancels the proposal distribution and recovers the target expectation. Unbiasedness holds for any proposal q that has support wherever p * f is nonzero. However, unbiasedness says nothing about variance — a poorly chosen q can produce astronomical variance. The optimal proposal distribution (minimizing variance) is q*(x) proportional to |f(x)| * p(x), which concentrates samples where the integrand is large.
Question 3 Short Answer
Explain why random sampling enables sublinear-time algorithms and what fundamental tradeoff is involved.
Think about your answer, then reveal below.
Model answer: If a property holds for a large fraction of the input, a small random sample will contain evidence of that property with high probability. Specifically, to distinguish 'property holds everywhere' from 'property fails on epsilon-fraction of elements' with confidence 1-delta, you need only O(1/epsilon * log(1/delta)) samples — independent of input size n. This enables algorithms that run in time sublinear in n. The fundamental tradeoff is precision vs. speed: you can only make approximate statements (within epsilon of the truth) because you haven't seen most of the input. The sample size depends on the desired accuracy and confidence, not on n, which is why these algorithms achieve sublinear time.
This tradeoff is formalized in property testing: you are guaranteed either that the input has a property or is epsilon-far from having it, and must distinguish these cases using few samples. The connection between sampling and approximation is the theoretical foundation for all sublinear algorithms.
Question 4 True / False
Reservoir sampling requires knowing the total stream length n in advance to set correct replacement probabilities.
TTrue
FFalse
Answer: False
This is precisely the problem reservoir sampling solves. The replacement probability at step i is k/i — it depends only on the current position i, not the total length n. The algorithm processes elements one at a time and maintains uniform sampling at every prefix of the stream. When the stream ends at any point n, the reservoir contains a uniform random sample of size k from all n elements seen. This is what makes reservoir sampling suitable for streaming settings where n is unknown or effectively infinite.