You want to estimate the mean of a distribution in [0,1] to within epsilon = 0.05 with probability at least 0.95 (delta = 0.05). Using Hoeffding's inequality, how many samples do you need?
Hoeffding's inequality states P(|mean - mu| >= epsilon) <= 2*exp(-2n*epsilon^2). Setting 2*exp(-2n*0.0025) <= 0.05, we get exp(-0.005n) <= 0.025, so 0.005n >= ln(40) ≈ 3.69, giving n >= 738. This is a non-asymptotic, distribution-free bound — it holds for ANY distribution on [0,1], not just Gaussians. The O(1/epsilon^2) dependence is the fundamental rate for estimating means from bounded data. Bernstein's inequality can improve this when the variance is much smaller than the range, but Hoeffding is the universal starting point.
Question 2 Multiple Choice
McDiarmid's inequality applies to a function f(X_1, ..., X_n) where changing any single X_i can change f by at most c_i. Why is this more general than Hoeffding's inequality?
AMcDiarmid handles continuous random variables while Hoeffding only works for discrete ones
BMcDiarmid applies to any function of independent variables with bounded sensitivity, not just sums — it covers statistics like the median, maximum, or the empirical risk of a hypothesis, as long as each data point's influence is bounded
CMcDiarmid provides tighter bounds for all the same settings where Hoeffding applies
DMcDiarmid does not require independence, while Hoeffding does
Hoeffding's inequality bounds deviations of sums (or means) of independent bounded variables. McDiarmid's extends this to any function f(X_1, ..., X_n) satisfying the bounded differences property: changing one input changes the output by at most c_i. This covers a much wider class of statistics. For example, the empirical risk of a hypothesis is a function of n training examples, and changing one example changes the risk by at most 1/n (bounded difference c_i = 1/n). McDiarmid's gives P(|f - E[f]| >= t) <= 2*exp(-2t^2 / sum c_i^2), which reduces to Hoeffding when f is a sum. Both require independence.
Question 3 True / False
Bernstein's inequality is always tighter than Hoeffding's inequality because it incorporates the variance.
TTrue
FFalse
Answer: False
Bernstein's inequality is tighter when the variance sigma^2 is much smaller than the range would suggest. For a variable in [0,1], Hoeffding treats the worst case as sigma^2 = 1/4 (the maximum variance for bounded variables). If the actual variance is 0.01, Bernstein gives a much tighter bound by using sigma^2 = 0.01 instead. However, Bernstein requires knowing or bounding the variance, while Hoeffding only requires knowing the range. When the variance is close to the maximum (sigma^2 ≈ 1/4 for [0,1] variables), Bernstein offers little improvement. So Bernstein dominates Hoeffding in the small-variance regime but offers no advantage when variance is large.
Question 4 True / False
Concentration inequalities are unnecessary for learning theory because the law of large numbers already guarantees that sample means converge to true means.
TTrue
FFalse
Answer: False
The law of large numbers guarantees convergence in the limit (as n -> infinity) but says nothing about the rate or about finite samples. It tells you the sample mean WILL converge, not how many samples you need for a given accuracy. Learning theory requires finite-sample, quantitative bounds: 'with n = 1000 samples, the error is within 0.05 with probability 0.99.' Only concentration inequalities provide this. Furthermore, learning theory needs UNIFORM convergence (over all hypotheses), which requires even stronger tools than pointwise convergence — the Sauer-Shelah lemma combined with concentration inequalities, not just the law of large numbers.
Question 5 Short Answer
Explain why Hoeffding's inequality, rather than the central limit theorem, is the appropriate tool for deriving sample complexity bounds in learning theory.
Think about your answer, then reveal below.
Model answer: The central limit theorem (CLT) says the sample mean is approximately normal for large n, but 'large n' is not quantified — the approximation quality depends on the distribution and has no universal finite-sample guarantee. Hoeffding's inequality gives an exact, non-asymptotic bound that holds for ANY sample size n and ANY distribution on bounded variables: P(|mean - mu| >= t) <= 2*exp(-2nt^2). This is essential for learning theory because: (1) we need a bound that holds for a specific, finite n, not an asymptotic regime; (2) the bound must be distribution-free (PAC learning makes no distributional assumptions); (3) we need an explicit probability bound (the delta in PAC). The CLT provides none of these. Concentration inequalities are the finite-sample, distribution-free replacement for the CLT in learning theory.
The CLT is an asymptotic approximation; concentration inequalities are exact finite-sample bounds. For deriving sample complexity (how many samples suffice for a given epsilon-delta guarantee), only finite-sample bounds are logically valid.