The growth function Pi_H(n) counts the maximum number of distinct labelings a hypothesis class H can induce on any set of n points. Since n points have 2^n possible labelings, the growth function satisfies Pi_H(n) <= 2^n. When Pi_H(n) = 2^n, the class shatters some set of n points. The Sauer-Shelah lemma provides the critical bridge: if the VC dimension is d, then Pi_H(n) <= sum_{i=0}^{d} C(n,i), which is O(n^d) — meaning the growth function transitions from exponential to polynomial at the VC dimension. This polynomial growth is what makes uniform convergence and PAC learning possible.
The growth function and the concept of shattering provide the combinatorial foundation that connects VC dimension to generalization bounds. While VC dimension gives a single number characterizing a hypothesis class, the growth function reveals the full picture of how the class's effective complexity scales with the number of data points.
For a hypothesis class H and a set of n points, consider all the ways H can label those points. Each hypothesis h in H produces a binary labeling (a restriction of h to those n points), and different hypotheses might produce the same labeling on this particular set. The growth function Pi_H(n) is the maximum, over all possible sets of n points, of the number of distinct labelings H can produce. When Pi_H(n) = 2^n — every possible labeling is achievable — we say H shatters some set of n points. The VC dimension is the largest n for which this happens.
The Sauer-Shelah lemma (proved independently by Sauer, Shelah, and Vapnik-Chervonenkis) reveals a remarkable phase transition. If H fails to shatter any set of d+1 points (VC dimension is d), then the growth function cannot simply decrease gradually — it collapses from exponential to polynomial: Pi_H(n) <= sum_{i=0}^{d} C(n,i), which is at most (en/d)^d = O(n^d). There is no middle ground. Either the class shatters sets of every size (infinite VC dimension, exponential growth function) or it eventually stops shattering and the growth function becomes polynomial. This dichotomy is sometimes called the "shattering lemma" or "Vapnik-Chervonenkis lemma."
This polynomial bound is what makes learning theory work. The key proof technique for generalization bounds — uniform convergence — requires controlling the probability that any hypothesis in the class has a large gap between training and true error. The effective number of hypotheses to control is not the total number (which may be infinite, as with all linear classifiers) but the growth function — the number of distinct behaviors on the sample. If this number is polynomial in n, a union bound argument succeeds: each individual hypothesis is unlikely to have a large gap, and with only polynomially many to check, the probability that any one of them has a large gap remains small. If the growth function were exponential, the union bound would fail, and no finite sample could provide uniform convergence. The exponential-to-polynomial transition at the VC dimension is therefore the exact boundary of learnability.