Classical bias-variance tradeoff predicts test error increases when model complexity exceeds a critical point. How does double descent reconcile this?
Think about your answer, then reveal below.
Model answer: Double descent reveals that the classical tradeoff holds only up to the interpolation threshold (where model size equals dataset size). Beyond this threshold, test error rises, then falls again as models become substantially overparameterized. The classical regime and the modern interpolation regime are separated by an overfitting peak; beyond the peak, large models with sufficient capacity to memorize yet learn generalizable structure outperform mid-sized models. This happens when implicit or explicit regularization (e.g., early stopping, weight decay, SGD noise) favors simple explanations even in the overparameterized regime.
Double descent is not a violation of the bias-variance tradeoff but a more nuanced picture: both regimes exist, separated by model capacity. In the classical regime (small models), bias dominates and test error decreases with capacity. In the interpolation regime (large models), memorization is possible but implicit regularization prevents overfitting, so test error decreases again. The classical tradeoff describes the transition between these regimes.