Questions — Batch Normalization — Open Knowledge Graph

Question 1 Multiple Choice

A model with batch normalization performs well during training but gives poor results at deployment. Training used batch size 64; deployment processes one image at a time. What is the most likely cause?

ABatch normalization degrades all models at deployment regardless of batch size

BThe model was not switched to evaluation mode, so it uses noisy single-sample batch statistics instead of the stored running averages

CBatch size 64 was too large, causing overfitting in the normalization layers

DThe learnable parameters γ and β are discarded when a model is deployed

Question 2 Multiple Choice

A researcher removes the learnable scale (γ) and shift (β) parameters from all batch normalization layers, leaving only the normalization step. What is the likely consequence?

ANo effect; γ and β are redundant because weights in the next layer can compensate

BThe network loses the ability to represent identity transforms or unnormalized distributions, constraining what functions it can learn

CTraining accelerates because there are fewer parameters to optimize

DRegularization increases because normalization is applied more strictly

Question 3 True / False

Batch normalization cannot reduce a network's representational capacity because the learnable parameters γ and β allow the network to recover any unnormalized distribution if gradient descent finds it useful.

TTrue

FFalse

Question 4 True / False

During training, batch normalization uses population statistics computed over the entire training dataset to normalize each layer's inputs.

TTrue

FFalse

Question 5 Short Answer

Why does batch normalization behave differently at training time versus inference time, and what bug does this difference commonly cause?

Think about your answer, then reveal below.

Questions: Batch Normalization