Questions: Deep Learning Theory

4 questions to test your understanding

Score: 0 / 4
Question 1 Multiple Choice

A neural network with 10 million parameters is trained on 50,000 examples and achieves 95% test accuracy. Classical learning theory (VC dimension or parameter counting) predicts this should overfit catastrophically. What theoretical frameworks help explain why it generalizes?

AThe network is not actually over-parameterized because dropout removes 90% of parameters during training
BImplicit regularization of SGD (which biases toward low-complexity solutions), norm-based generalization bounds (which depend on weight magnitudes, not parameter count), and the observation that the effective complexity of the learned function is much lower than the parameter count
CThe network memorizes all 50,000 training examples but the test set happens to be similar enough to training data
DOver-parameterization does not affect generalization — parameter count is irrelevant to overfitting
Question 2 True / False

Depth-separation results prove that there exist functions computable by a network of depth k with polynomial size that require exponential size at depth k-1.

TTrue
FFalse
Question 3 True / False

The neural tangent kernel (NTK) theory fully explains the success of deep learning in practice.

TTrue
FFalse
Question 4 Short Answer

Explain what 'implicit regularization' means in the context of deep learning and why it is considered a key part of the generalization puzzle.

Think about your answer, then reveal below.