Questions: Overfitting, Underfitting, and Model Capacity
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A neural network achieves 99% accuracy on training data but only 61% accuracy on the held-out validation set. Which condition does this describe, and what is the most appropriate remedy?
AUnderfitting; increase model depth or add more input features
BOverfitting; the model has memorized training noise — apply regularization, dropout, or gather more training data
CUnderfitting; the training set is too small to represent the problem
DOverfitting; reduce training time by stopping after fewer gradient steps regardless of validation trend
The large train-validation gap (99% vs 61%) is the classic signature of overfitting: the model has learned the specific noise and idiosyncrasies of the training set rather than the generalizable pattern. The fix targets capacity versus data: add more training data (so there is less room for noise to dominate), apply regularization (constrain weights), or use dropout (prevent co-adaptation of neurons). Simply stopping early (option D) can help but is incomplete — the core issue is model capacity relative to data.
Question 2 Multiple Choice
While training a model, you plot training loss and validation loss over epochs. Training loss decreases steadily throughout; validation loss decreases for the first 30 epochs, then starts rising. What does this pattern indicate?
AUnderfitting — the model cannot learn the training data and is struggling
BIdeal convergence — both losses will eventually meet at a low value if training continues
CThe onset of overfitting — after epoch 30 the model begins memorizing noise, harming generalization
DA bug in validation loss calculation — valid loss cannot rise if training loss is still falling
This divergence pattern is the definitive signature of overfitting in progress. Up to epoch 30, the model is extracting real patterns that generalize — both losses improve. After that, the model is fitting the noise unique to the training set; training loss keeps dropping (better memorization) while validation loss rises (worse generalization). Epoch 30 is the sweet spot — the point of best generalization before memorization dominates. Techniques like early stopping use exactly this signal.
Question 3 True / False
A model that achieves low training error and low validation error, with a small gap between them, has achieved well-matched capacity for the problem.
TTrue
FFalse
Answer: True
Low and similar errors on both training and validation data indicate the model has learned patterns that generalize: it neither memorizes noise (which would inflate the train-validation gap) nor oversimplifies (which would leave both errors high). This is the diagnostic target — the training-validation gap is the key signal for diagnosing capacity problems.
Question 4 True / False
A model with very high training error is almost certainly overfitting the training data.
TTrue
FFalse
Answer: False
High training error indicates underfitting — the model cannot capture patterns even in the data it was trained on. Overfitting requires the opposite: the model fits the training data very well (low training error) but fails to generalize (high validation error). The confusion arises because both are 'failure modes,' but they point in opposite directions: overfitting is too much capacity, underfitting is too little. The distinction is critical because they call for opposite remedies.
Question 5 Short Answer
Why does achieving low training error fail as a sufficient criterion for evaluating a machine learning model?
Think about your answer, then reveal below.
Model answer: Low training error can be achieved by memorizing the training data rather than learning the underlying pattern. A sufficiently complex model can fit any finite dataset perfectly — including its noise and measurement artifacts — while making nonsensical predictions on new data. Validation error on held-out data is required because it exposes whether the model generalizes: a model that truly learned the pattern will perform well on examples it has never seen, while a memorized model will not.
This is the fundamental distinction between memorization and generalization. The entire discipline of model evaluation exists because training performance is a necessary but wildly insufficient indicator of real-world performance. Every production ML workflow requires a held-out test set precisely because training performance tells you how well the model knows the training set, not how well it understands the problem.