An engineer builds an ensemble by training 100 decision trees on the exact same training data with no randomization, then averages their predictions. She expects significant performance gains over a single tree. What is the flaw in her reasoning?
AMore trees always improve performance regardless of how they are trained; the approach is valid
BWithout diversity, all trees make the same errors, so averaging them produces the same wrong answer more confidently rather than canceling errors
CEnsembles only work with fewer than 10 base models; 100 trees creates too much variance
DThe ensemble will improve bias but is guaranteed to increase variance, worsening overall performance
The theoretical guarantee for ensembles requires uncorrelated errors. Ensemble error equals average individual error minus average pairwise diversity. If diversity is zero (all trees identical), the ensemble error equals the single-model error — no improvement. Averaging identical predictions gives the same prediction. Diversity is not an implementation detail; it is the mechanism by which ensembles work. Without it, you have multiple copies of the same model, not an ensemble.
Question 2 Multiple Choice
A boosted model achieves near-perfect training accuracy after 500 boosting rounds but performs much worse on the test set. The most likely explanation is:
ABoosting sequentially corrects errors, and after enough rounds it can fit noise in the training data, leading to overfitting
BBagging was inadvertently applied instead of boosting, causing the base learners to underfit
CThe base learners were too diverse, causing their corrections to cancel each other out
DBoosting only reduces variance, not bias, so it cannot explain training accuracy improvements
Boosting reduces bias by targeting residual errors, but this very mechanism makes it prone to overfitting on noisy data: after enough iterations, it begins fitting the noise as though it were signal. Learning rate shrinkage and early stopping are the standard safeguards. This is in contrast to bagging, which is relatively resistant to overfitting because averaging independent models smooths out noise rather than amplifying it. The test/train gap here is the classic overfitting signature.
Question 3 True / False
Bagging primarily reduces variance by training multiple models on different random subsets of the training data and averaging their predictions, which cancels out uncorrelated errors.
TTrue
FFalse
Answer: True
This is the central mechanism of bagging. Each bootstrap sample produces a model with idiosyncratic errors tied to that particular sample. Because these errors are only weakly correlated across models, they tend to cancel when averaged. The systematic signal (the true pattern in the data) reinforces across models while the random noise averages out. Bagging does not substantially reduce bias — it does not make models more correct on average — but it reduces the variance of the ensemble prediction.
Question 4 True / False
Because boosting trains models sequentially, each one explicitly correcting the previous ensemble's errors, it is inherently more resistant to overfitting than bagging.
TTrue
FFalse
Answer: False
This is a common misconception. Boosting is actually *more* prone to overfitting than bagging, especially with noisy data, because it can learn to fit the noise if run for too many iterations. The sequential correction mechanism that reduces bias also means that with enough rounds, the ensemble increasingly accommodates every training example — including mislabeled or noisy ones. Bagging, by averaging independent models, tends to smooth noise away. Learning rate and early stopping are essential when boosting.
Question 5 Short Answer
Why is diversity among base learners the fundamental requirement for ensemble methods to work? What happens when diversity is absent?
Think about your answer, then reveal below.
Model answer: The theoretical result formalizes this directly: for regression with averaging, ensemble error equals the average individual error minus the average pairwise diversity (error correlation) among models. If all models make perfectly correlated errors — meaning diversity is zero — the ensemble error equals the single-model error: combining identical predictions gives the same wrong answer. Only when model errors are uncorrelated (or negatively correlated) do they cancel in the average, and the ensemble outperforms any individual. This is why bagging uses resampling, random forests add feature randomization, boosting reweights examples, and stacking uses different model families — each mechanism exists to produce diverse, uncorrelated error patterns.
Students often think combining more models always helps. The key insight is that 'more' is irrelevant without 'different.' Even mediocre models, if they err independently, can combine into a strong ensemble. Conversely, highly accurate models that err in the same direction on the same examples provide no benefit when combined.