5 questions to test your understanding
Why does dropout specifically prevent co-adaptation between neurons?
A model trained with 50% dropout (p = 0.5) is deployed for inference. What is the correct procedure for using the model's weights at test time?
Dropout reduces overfitting by permanently removing redundant neurons from the network, resulting in a smaller, more regularized model after training.
Dropout can be interpreted as simultaneously training an ensemble of 2ⁿ different thinned subnetworks, all sharing the same underlying weights.
Explain why dropout is less effective (or even harmful) in small networks compared to large, overparameterized networks.