Questions: Batch Normalization

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A model with batch normalization performs well during training but gives poor results at deployment. Training used batch size 64; deployment processes one image at a time. What is the most likely cause?

ABatch normalization degrades all models at deployment regardless of batch size
BThe model was not switched to evaluation mode, so it uses noisy single-sample batch statistics instead of the stored running averages
CBatch size 64 was too large, causing overfitting in the normalization layers
DThe learnable parameters γ and β are discarded when a model is deployed
Question 2 Multiple Choice

A researcher removes the learnable scale (γ) and shift (β) parameters from all batch normalization layers, leaving only the normalization step. What is the likely consequence?

ANo effect; γ and β are redundant because weights in the next layer can compensate
BThe network loses the ability to represent identity transforms or unnormalized distributions, constraining what functions it can learn
CTraining accelerates because there are fewer parameters to optimize
DRegularization increases because normalization is applied more strictly
Question 3 True / False

Batch normalization cannot reduce a network's representational capacity because the learnable parameters γ and β allow the network to recover any unnormalized distribution if gradient descent finds it useful.

TTrue
FFalse
Question 4 True / False

During training, batch normalization uses population statistics computed over the entire training dataset to normalize each layer's inputs.

TTrue
FFalse
Question 5 Short Answer

Why does batch normalization behave differently at training time versus inference time, and what bug does this difference commonly cause?

Think about your answer, then reveal below.