Questions — Fine-Tuning Pretrained Models

Question 1 Multiple Choice

You have a small dataset of 500 medical X-ray images and want to fine-tune an ImageNet-pretrained ResNet. Which strategy is most appropriate?

AFull fine-tuning with a standard learning rate, since the large model capacity is needed for medical images

BFeature extraction (freeze all pretrained layers, train only the new head) to avoid overfitting with limited data

CTrain from scratch with random initialization to ensure the model learns medical-specific features

DUse discriminative learning rates with the highest rate on early layers since medical features differ most there

Question 2 Multiple Choice

In discriminative (layer-wise) fine-tuning, early layers receive smaller learning rates than later layers. Why?

AEarly layers have more parameters and need smaller updates for numerical stability

BEarly layers learn universal features (edges, textures) that transfer well and need minimal adjustment

CEarly layers are closer to the output and thus more sensitive to gradient updates

DEarlier layers converge faster, so they need smaller learning rates to prevent overshooting

Question 3 True / False

Fine-tuning a pretrained model with a learning rate close to the original pretraining rate is a safe starting point because the pretrained weights provide a good initialization.

TTrue

FFalse

Question 4 True / False

Feature extraction (freezing all pretrained layers and training only the new head) performs worse than full fine-tuning when the target task is very different from the pretraining domain.

TTrue

FFalse

Question 5 Short Answer

Why is a much lower learning rate used when fine-tuning a pretrained model compared to training from scratch, and what specific failure mode does it prevent?

Think about your answer, then reveal below.

Questions: Fine-Tuning Pretrained Models