Questions — Deep Learning Theory — Open Knowledge Graph

Question 1 Multiple Choice

A neural network with 10 million parameters is trained on 50,000 examples and achieves 95% test accuracy. Classical learning theory (VC dimension or parameter counting) predicts this should overfit catastrophically. What theoretical frameworks help explain why it generalizes?

AThe network is not actually over-parameterized because dropout removes 90% of parameters during training

BImplicit regularization of SGD (which biases toward low-complexity solutions), norm-based generalization bounds (which depend on weight magnitudes, not parameter count), and the observation that the effective complexity of the learned function is much lower than the parameter count

CThe network memorizes all 50,000 training examples but the test set happens to be similar enough to training data

DOver-parameterization does not affect generalization — parameter count is irrelevant to overfitting

Question 2 True / False

Depth-separation results prove that there exist functions computable by a network of depth k with polynomial size that require exponential size at depth k-1.

TTrue

FFalse

Question 3 True / False

The neural tangent kernel (NTK) theory fully explains the success of deep learning in practice.

TTrue

FFalse

Question 4 Short Answer

Explain what 'implicit regularization' means in the context of deep learning and why it is considered a key part of the generalization puzzle.

Think about your answer, then reveal below.

Questions: Deep Learning Theory