Questions: Decision Trees and Random Forests

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A single deep decision tree achieves 100% accuracy on training data but only 70% on a held-out test set. A random forest of 500 trees achieves 93% training accuracy and 89% test accuracy. What best explains why the forest outperforms the single tree on test data?

AThe forest uses more training data because each tree sees a bootstrap sample larger than the original dataset
BEach tree in the forest is shallower and therefore has higher bias, which generalizes better
CThe forest averages many high-variance, decorrelated trees, reducing overall variance while preserving low bias
DThe forest eliminates all irrelevant features, leaving only the most predictive ones
Question 2 Multiple Choice

What is the primary purpose of selecting only a random subset of features at each split in a random forest, rather than considering all features?

AIt speeds up training by reducing computation at each node
BIt forces each tree to use every feature at least once, ensuring full coverage
CIt decorrelates the trees so that their errors are independent and cancel when averaged
DIt prevents any single tree from overfitting by limiting its information access
Question 3 True / False

Adding more trees to a random forest will eventually cause it to overfit the training data, just as a single deep tree does.

TTrue
FFalse
Question 4 True / False

Random forests preserve interpretability because you can inspect the individual trees and trace the decision path for any prediction.

TTrue
FFalse
Question 5 Short Answer

Explain why averaging many decision trees reduces prediction error. What role does the 'random feature subset' step play, and what would happen if it were removed?

Think about your answer, then reveal below.