Why does increasing model complexity reduce bias but increase variance? Explain the mechanism in terms of what the model is learning.
Think about your answer, then reveal below.
Model answer: A simpler model (e.g., a linear function) makes strong assumptions about the form of the relationship between inputs and outputs. These assumptions are usually wrong to some degree — they produce systematic error (bias) that persists no matter how much data you train on. A more complex model (e.g., a high-degree polynomial) makes weaker assumptions and can approximate any shape, so it can more accurately fit the true underlying pattern — reducing bias. However, this flexibility also means the model can fit the noise in the specific training sample, not just the signal. Different training samples have different noise patterns, so the complex model's predictions vary more across samples — high variance. Complexity trades the rigidity of fixed assumptions (bias) for the instability of fitting noise (variance).
The decomposition Error = Bias² + Variance + Noise makes this precise. Bias reflects how wrong the average prediction is; variance reflects how much predictions fluctuate around that average. Adding complexity reduces the average error (bias) while inflating the fluctuation (variance). The optimal complexity minimizes their sum, not either one alone.