Questions: Fine-Tuning Pretrained Models

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

You have a small dataset of 500 medical X-ray images and want to fine-tune an ImageNet-pretrained ResNet. Which strategy is most appropriate?

AFull fine-tuning with a standard learning rate, since the large model capacity is needed for medical images
BFeature extraction (freeze all pretrained layers, train only the new head) to avoid overfitting with limited data
CTrain from scratch with random initialization to ensure the model learns medical-specific features
DUse discriminative learning rates with the highest rate on early layers since medical features differ most there
Question 2 Multiple Choice

In discriminative (layer-wise) fine-tuning, early layers receive smaller learning rates than later layers. Why?

AEarly layers have more parameters and need smaller updates for numerical stability
BEarly layers learn universal features (edges, textures) that transfer well and need minimal adjustment
CEarly layers are closer to the output and thus more sensitive to gradient updates
DEarlier layers converge faster, so they need smaller learning rates to prevent overshooting
Question 3 True / False

Fine-tuning a pretrained model with a learning rate close to the original pretraining rate is a safe starting point because the pretrained weights provide a good initialization.

TTrue
FFalse
Question 4 True / False

Feature extraction (freezing all pretrained layers and training only the new head) performs worse than full fine-tuning when the target task is very different from the pretraining domain.

TTrue
FFalse
Question 5 Short Answer

Why is a much lower learning rate used when fine-tuning a pretrained model compared to training from scratch, and what specific failure mode does it prevent?

Think about your answer, then reveal below.