Questions — Generalization Bounds for Deep Networks

Question 1 Multiple Choice

A network with 10^7 parameters achieves 1% training error and 5% test error. The VC-dimension-based bound predicts the test error could be as high as 100% (the bound is vacuous). Why do VC-based bounds fail here?

AVC dimension does not apply to neural networks because they are non-linear

BThe VC dimension is proportional to the number of parameters (~10^7), making the bound O(sqrt(10^7/n)) — which exceeds 1 for any reasonable training set size, rendering the bound vacuous (greater than the trivial 100% bound)

CVC-based bounds require convex hypothesis classes, and neural networks are non-convex

DThe training error of 1% is too high for VC bounds to be informative

Question 2 Multiple Choice

PAC-Bayes bounds depend on the KL divergence between the learned weight distribution and a prior distribution chosen before seeing data. Why is the choice of prior critical?

AA bad prior makes the bound infinite, while a good prior that places mass near the learned weights gives a tight bound

BThe prior must be a Gaussian distribution — other distributions are not valid

CThe prior controls the learning rate of the optimization algorithm

DThe prior is irrelevant — PAC-Bayes bounds are prior-free in practice

Question 3 True / False

Spectrally-normalized margin bounds grow with the product of layer spectral norms, not the number of parameters. This means a 1000-layer network with small spectral norms per layer can generalize better than a 2-layer network with large spectral norms.

TTrue

FFalse

Question 4 Short Answer

Explain why compression-based generalization bounds are conceptually natural for deep networks and how they avoid the parameter-counting problem.

Think about your answer, then reveal below.

Questions: Generalization Bounds for Deep Networks