The bias-complexity tradeoff formalizes the bias-variance tradeoff in the language of learning theory. The true risk of an ERM hypothesis decomposes into approximation error (how close the best hypothesis in the class is to the true target — the "bias") and estimation error (how much the learned hypothesis deviates from the class's best due to finite samples — the "complexity" penalty). Approximation error decreases with richer hypothesis classes; estimation error increases because richer classes require more data for uniform convergence. The optimal class minimizes their sum, and this decomposition drives structural risk minimization.
You have already encountered the bias-variance tradeoff as an intuitive principle: simple models underfit (high bias), complex models overfit (high variance), and the sweet spot balances both. The bias-complexity tradeoff recasts this principle in the rigorous language of PAC learning theory, making it quantitative and actionable.
The formal decomposition splits the true risk of the ERM hypothesis into two terms. The approximation error is the risk of the best possible hypothesis in the class: epsilon_app = min_{h in H} R(h) - R(h*), where h* is the Bayes-optimal classifier. This measures how well the hypothesis class can approximate the target, independent of the amount of data. A richer class has lower approximation error. The estimation error is the gap between the ERM hypothesis and the class's best: epsilon_est = R(h_ERM) - min_{h in H} R(h). This measures how much performance is lost because we must learn from a finite sample rather than having infinite data. A richer class has higher estimation error because the ERM procedure has more hypotheses to distinguish between with limited evidence.
The estimation error is where VC dimension and sample complexity enter. For a class with VC dimension d and n training examples, the estimation error is bounded by O(sqrt(d/n)). This gives a precise prescription: if you double the VC dimension, you need roughly four times as many samples to maintain the same estimation error. The approximation error, by contrast, is a property of the class and the target function — it decreases as the class grows but has no closed-form expression without knowing the target. The total risk is their sum, and the optimal hypothesis class minimizes this sum for the given sample size.
This decomposition motivates structural risk minimization (covered in a subsequent topic): given a nested sequence of hypothesis classes H_1 subset H_2 subset ... with increasing VC dimensions, choose the class that minimizes the bound on approximation plus estimation error. With small n, simpler classes win because estimation error dominates; with large n, richer classes win because estimation error shrinks and approximation error can be reduced without paying too high a price. The bias-complexity tradeoff is not just a restatement of bias-variance in different notation — it provides computable bounds that connect hypothesis class choice to sample size through VC dimension, giving a principled basis for model selection.