Questions: Implicit Regularization

4 questions to test your understanding

Score: 0 / 4
Question 1 Multiple Choice

A neural network is trained with gradient descent on a non-convex loss with no explicit regularization term. The network fits all training data perfectly. Why might it still generalize well?

AGradient descent avoids local minima that overfit; it always finds the global optimum
BImplicit regularization from gradient descent's optimization trajectory biases solutions toward those with good generalization properties (e.g., small norm, large margin)
CPerfect fitting to training data is impossible; the network must be leaving some training errors
DNeural networks have built-in safeguards that prevent memorization regardless of capacity
Question 2 Multiple Choice

Implicit regularization depends on which of the following factors?

AOnly the loss function; the optimization algorithm does not matter
BThe optimization algorithm (GD vs SGD vs Adam), learning rate, initialization, and parameterization structure
COnly the model's parameter count; the algorithm is irrelevant
DThe batch size and nothing else
Question 3 Short Answer

Early stopping is a form of explicit regularization. How does it relate to implicit regularization?

Think about your answer, then reveal below.
Question 4 True / False

For linear regression, gradient descent converges to the minimum-norm solution min_w ||w||^2 subject to fitting the training data. Is this implicit regularization?

TTrue
FFalse