Questions: Gradient Boosting Machines

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

Both gradient boosting and random forests use ensembles of decision trees. What is the most fundamental architectural difference between the two methods?

ARandom forests use deeper trees; gradient boosting always uses shallow stumps
BGradient boosting trains trees sequentially, each correcting the errors of the previous ensemble; random forests train trees independently in parallel and average their predictions
CGradient boosting reduces variance; random forests reduce bias
DRandom forests are restricted to squared-error loss; gradient boosting can use any loss function
Question 2 Multiple Choice

When gradient boosting uses absolute error loss instead of squared error, each new tree is fitted to which target values?

AThe original target values, to ensure the tree sees the full signal
BThe negative gradient of the absolute error loss evaluated at each data point's current predicted value (the pseudo-residuals)
CA bootstrap-reweighted sample with misclassified examples upweighted, as in AdaBoost
DThe Hessian of the loss function, enabling second-order optimization at each step
Question 3 True / False

Reducing the learning rate in gradient boosting usually decreases final model accuracy because each tree contributes less to the ensemble.

TTrue
FFalse
Question 4 True / False

In gradient boosting, each tree is trained to predict the original target values, and the residuals from each tree are used primarily to select subsequent tree split points.

TTrue
FFalse
Question 5 Short Answer

Explain why gradient boosting is called 'gradient' boosting — what gradient is being computed, and in what space is gradient descent being performed?

Think about your answer, then reveal below.