Explain why VC dimension, rather than the number of parameters, is the correct measure of hypothesis class complexity for learning theory.
Think about your answer, then reveal below.
Model answer: The number of parameters describes how a hypothesis class is parameterized, which is an artifact of representation, not a fundamental property of the function class. Different parameterizations of the same set of functions can have different parameter counts. VC dimension instead measures the intrinsic expressive capacity — the largest number of points the class can classify in all possible ways. This directly governs generalization: a class with VC dimension d requires O(d/epsilon) samples to learn, regardless of how many parameters the representation uses. The sine example (one parameter, infinite VC dimension) and constrained neural networks (many parameters, finite VC dimension) show that parameter count and VC dimension can diverge dramatically. Since sample complexity depends on VC dimension and not parameter count, VC dimension is the theoretically correct measure.
This distinction matters practically too. Modern deep networks have millions of parameters but generalize well — their effective complexity (related to but not equal to VC dimension) is controlled by optimization dynamics, initialization, and implicit regularization, not raw parameter count.