A hypothesis class has VC dimension 20. You want to achieve epsilon = 0.01 error with delta = 0.05 confidence in the realizable setting. Approximately how many samples do you need?
AAbout 2,000 samples — roughly d/epsilon
BAbout 20,000 samples — roughly (d/epsilon) * log(1/epsilon) = (20/0.01) * log(100) ≈ 2000 * 4.6 ≈ 9,200, plus the (1/epsilon)*log(1/delta) ≈ 100 * 3 term
CAbout 200,000 samples — roughly d/epsilon^2
DAbout 200 samples — roughly d * log(1/epsilon)
In the realizable setting, sample complexity is O((d/epsilon) * log(1/epsilon) + (1/epsilon) * log(1/delta)). For d=20, epsilon=0.01, delta=0.05: the first term is (20/0.01) * log(100) ≈ 2000 * 4.6 = 9,200. The second term is (1/0.01) * log(20) ≈ 100 * 3 = 300. Total: approximately 9,500 samples. The d/epsilon * log(1/epsilon) term dominates for small epsilon. In the agnostic setting, this would be d/epsilon^2 = 20/0.0001 = 200,000 — an order of magnitude more, reflecting the price of agnosticism.
Question 2 True / False
The sample complexity bound Theta(d/epsilon^2) for agnostic learning means that to halve the error, you need four times as many samples.
TTrue
FFalse
Answer: True
If the sample complexity is proportional to d/epsilon^2, then reducing epsilon by half (from epsilon to epsilon/2) requires d/(epsilon/2)^2 = 4d/epsilon^2 samples — four times as many. This quadratic relationship means that achieving high accuracy is expensive: going from 10% error to 1% error requires 100 times more data. This is a fundamental property of estimation from noisy data — the 1/epsilon^2 scaling comes from the variance of statistical estimation and cannot be improved in the worst case. (In the realizable setting, the scaling is 1/epsilon, which is more forgiving — halving epsilon only doubles the sample requirement.)
Question 3 True / False
Sample complexity bounds depend only on the VC dimension and the accuracy parameters (epsilon, delta). They are independent of the data distribution.
TTrue
FFalse
Answer: True
This is the distribution-free property inherited from the PAC framework. The sample complexity bound using VC dimension holds for any distribution D over the input space. The same number of samples suffices whether the data is uniformly distributed, concentrated on a few points, or arranged in any other way. This makes the bounds robust but potentially conservative — for specific 'easy' distributions, fewer samples might actually suffice. Distribution-dependent bounds (using Rademacher complexity or local Rademacher complexity) can be tighter for specific distributions but require knowledge of or assumptions about the distribution.
Question 4 Short Answer
Explain the role of matching upper and lower bounds in establishing tight sample complexity characterizations, and why both are needed.
Think about your answer, then reveal below.
Model answer: An upper bound shows that some algorithm succeeds with at most m(epsilon, delta) samples — it is a sufficiency result. A lower bound shows that no algorithm can succeed with fewer than m'(epsilon, delta) samples — it is a necessity result. When the upper and lower bounds match (up to constant factors), the sample complexity is 'tight' or 'characterized': we know exactly how many samples are needed, not just an order of magnitude. Without a matching lower bound, an upper bound might be loose — a better algorithm could need fewer samples. Without a matching upper bound, a lower bound might not be achievable — the stated difficulty might be harder than what any algorithm can handle. For agnostic PAC learning with VC dimension d, the matching bounds Theta(d/epsilon^2) mean that the sample complexity is known exactly (up to constants): O(d/epsilon^2) samples suffice (proved via ERM + uniform convergence), and Omega(d/epsilon^2) are necessary (proved via information-theoretic arguments). No improvement is possible.
Matching bounds are the strongest form of theoretical result in learning theory. They close a problem completely — no further research on the sample complexity of that setting can improve the bounds. This is why establishing tight bounds, rather than just upper bounds, is highly valued.