Questions: Implicit Regularization

4 questions to test your understanding

Score: 0 / 4

Question 1 Multiple Choice

A neural network is trained with gradient descent on a non-convex loss with no explicit regularization term. The network fits all training data perfectly. Why might it still generalize well?

AGradient descent avoids local minima that overfit; it always finds the global optimum

BImplicit regularization from gradient descent's optimization trajectory biases solutions toward those with good generalization properties (e.g., small norm, large margin)

CPerfect fitting to training data is impossible; the network must be leaving some training errors

DNeural networks have built-in safeguards that prevent memorization regardless of capacity

Question 2 Multiple Choice

Implicit regularization depends on which of the following factors?

AOnly the loss function; the optimization algorithm does not matter

BThe optimization algorithm (GD vs SGD vs Adam), learning rate, initialization, and parameterization structure

COnly the model's parameter count; the algorithm is irrelevant

DThe batch size and nothing else

Question 3 Short Answer

Early stopping is a form of explicit regularization. How does it relate to implicit regularization?

Think about your answer, then reveal below.

Question 4 True / False

For linear regression, gradient descent converges to the minimum-norm solution min_w ||w||^2 subject to fitting the training data. Is this implicit regularization?

TTrue

FFalse