Random sampling is a foundational technique in algorithm design where selecting elements randomly from a dataset enables efficient estimation, selection, and optimization. Reservoir sampling solves the problem of uniformly sampling k items from a stream of unknown length in O(k) space. Importance sampling reweights samples to reduce variance when estimating expectations, enabling efficient simulation of rare events. Random sampling underpins randomized selection (expected O(n) median finding), random projections (Johnson-Lindenstrauss dimensionality reduction), and the design of sublinear-time algorithms that make decisions by examining only a small fraction of the input.
Random sampling is one of the most versatile tools in the algorithm designer's toolkit. At its simplest, drawing a random subset of an input lets you estimate global properties without examining every element. But the techniques range from the elegant (reservoir sampling for streams) to the sophisticated (importance sampling for variance reduction), and the theoretical foundations connect to concentration inequalities, approximation theory, and information-theoretic limits.
Reservoir sampling addresses a clean problem: maintain a uniform random sample of k elements from a data stream whose length is unknown. The algorithm initializes the reservoir with the first k elements, then for each subsequent element i, includes it with probability k/i (replacing a random existing element). The proof of correctness is a beautiful telescoping argument: each element's survival probability across all future replacement rounds collapses to exactly k/n. The algorithm uses O(k) memory regardless of stream length, making it practical for massive data streams where you cannot store or revisit the data.
Importance sampling solves a different problem: efficiently estimating E_p[f(x)] when sampling from p is difficult or when naive sampling has high variance. Instead of drawing from p, you sample from a proposal distribution q and reweight each sample by p(x)/q(x). The estimator is unbiased for any q with adequate support, but the variance depends critically on how well q matches the shape of |f(x)| * p(x). The optimal proposal concentrates samples where the integrand is large, dramatically reducing the number of samples needed. This is essential in computational physics (rare event simulation), Bayesian inference (sampling from complex posteriors), and Monte Carlo integration.
The deeper significance of random sampling is that it enables sublinear-time computation. If you want to determine whether a property holds for most elements of a massive dataset, you do not need to examine every element — a random sample of size O(1/epsilon) suffices to distinguish "property holds everywhere" from "property fails on epsilon-fraction of elements," independent of the dataset size. This insight underlies property testing, streaming algorithms, and the entire field of sublinear algorithms. The price is approximation: you sacrifice exact answers for massive speed gains. But in an era of terabyte-scale data, an approximate answer in seconds often dominates an exact answer in hours.