A hypothesis class has VC dimension 4. For a dataset of 100 points, approximately how does the growth function compare to the total number of possible labelings?
AThe growth function is approximately 2^100, essentially all labelings
BThe growth function is at most O(100^4) ≈ 10^8, a tiny fraction of 2^100 ≈ 10^30
CThe growth function is exactly 2^4 = 16, since the class can only shatter 4 points
DThe growth function equals C(100, 4) = 3,921,225, the number of ways to choose 4 points from 100
The Sauer-Shelah lemma states that for VC dimension d, the growth function Pi_H(n) <= sum_{i=0}^{d} C(n,i), which is O(n^d). For d=4 and n=100, this is roughly 100^4 = 10^8. The total number of possible labelings is 2^100 ≈ 10^30. So the hypothesis class can realize at most about 10^8 out of 10^30 possible labelings — an astronomically small fraction. This massive gap between the class's expressive power and the total possibilities is exactly what enables generalization: the class is too constrained to memorize arbitrary patterns.
Question 2 Multiple Choice
If a hypothesis class has growth function Pi_H(n) = 2^n for all n, what can you conclude about its VC dimension?
AThe VC dimension is exactly n
BThe VC dimension is infinite — the class can shatter sets of every size
CThe VC dimension is undefined because growth functions cannot equal 2^n
DThe VC dimension is 1, since 2^n grows exponentially from base 2
If Pi_H(n) = 2^n for all n, then for every n, there exists a set of n points on which H induces all 2^n labelings — meaning H shatters sets of every size. By definition, the VC dimension is the largest n for which shattering is possible, and since there is no upper bound here, the VC dimension is infinite. The Sauer-Shelah lemma tells us the converse: once the VC dimension is finite (say d), the growth function must drop from 2^n to O(n^d) for n > d. This is the phase transition that separates learnable from unlearnable classes.
Question 3 True / False
The growth function of a hypothesis class can take any value between 1 and 2^n for a sample of size n.
TTrue
FFalse
Answer: False
The growth function cannot take arbitrary values. The Sauer-Shelah lemma constrains it: if Pi_H(n) < 2^n for any n, then Pi_H(m) <= sum_{i=0}^{d} C(m,i) for all m, where d is the VC dimension. This means the growth function either equals 2^n (full shattering) or drops to a polynomial bound — there is no gradual intermediate behavior. This is sometimes called the 'phase transition' or 'dichotomy' of the growth function: exponential up to the VC dimension, polynomial after it. Values between the polynomial bound and 2^n are not achievable.
Question 4 True / False
Shattering a set of points means that the hypothesis class can correctly classify those points with any labeling, but it says nothing about points outside the set.
TTrue
FFalse
Answer: True
Shattering is a purely combinatorial property about the restrictions of H to a specific finite set. When H shatters S = {x_1, ..., x_n}, for each of the 2^n binary labelings of these n points, at least one hypothesis h in H matches that labeling on S. The behavior of h on any other point is irrelevant to the definition. Two hypotheses that agree on S but disagree everywhere else both 'count' for different labelings of S. This is why shattering is a measure of local expressiveness — it characterizes what the class can do on a specific set, not globally.
Question 5 Short Answer
Explain why the transition of the growth function from exponential to polynomial at the VC dimension is the key insight that enables PAC learning guarantees.
Think about your answer, then reveal below.
Model answer: PAC learning requires uniform convergence: the guarantee that training error approximates true error simultaneously for all hypotheses in the class. The number of hypotheses to control is effectively measured by the growth function — it counts the distinct behaviors of the class on a sample. If the growth function is 2^n (exponential), there are too many effective hypotheses to control with a polynomial number of samples: some hypothesis will fit noise by chance. The Sauer-Shelah lemma shows that finite VC dimension forces the growth function to be O(n^d) — polynomial. With only polynomially many effective hypotheses, a union bound combined with concentration inequalities gives uniform convergence with polynomially many samples. The exponential-to-polynomial phase transition is exactly the boundary between classes where uniform convergence is achievable (finite VC dimension, learnable) and classes where it is not (infinite VC dimension, not PAC-learnable).
This is the deep mathematical reason that VC dimension characterizes learnability. The growth function is the combinatorial object that the uniform convergence argument actually needs to bound, and the Sauer-Shelah lemma translates the VC dimension (a shattering property) into the growth function bound (a counting property) that makes the probabilistic argument work.