Questions — Gradient Boosting Machines

Question 1 Multiple Choice

Both gradient boosting and random forests use ensembles of decision trees. What is the most fundamental architectural difference between the two methods?

ARandom forests use deeper trees; gradient boosting always uses shallow stumps

BGradient boosting trains trees sequentially, each correcting the errors of the previous ensemble; random forests train trees independently in parallel and average their predictions

CGradient boosting reduces variance; random forests reduce bias

DRandom forests are restricted to squared-error loss; gradient boosting can use any loss function

Question 2 Multiple Choice

When gradient boosting uses absolute error loss instead of squared error, each new tree is fitted to which target values?

AThe original target values, to ensure the tree sees the full signal

BThe negative gradient of the absolute error loss evaluated at each data point's current predicted value (the pseudo-residuals)

CA bootstrap-reweighted sample with misclassified examples upweighted, as in AdaBoost

DThe Hessian of the loss function, enabling second-order optimization at each step

Question 3 True / False

Reducing the learning rate in gradient boosting usually decreases final model accuracy because each tree contributes less to the ensemble.

TTrue

FFalse

Question 4 True / False

In gradient boosting, each tree is trained to predict the original target values, and the residuals from each tree are used primarily to select subsequent tree split points.

TTrue

FFalse

Question 5 Short Answer

Explain why gradient boosting is called 'gradient' boosting — what gradient is being computed, and in what space is gradient descent being performed?

Think about your answer, then reveal below.

Questions: Gradient Boosting Machines