A large model trained on 1000 classes is given 5 examples of a new class and fine-tuned for 100 epochs. Why does this approach typically fail in practice?
AThe model's architecture cannot be extended to new classes without retraining from scratch
BThe model catastrophically overfits to the 5 examples, memorizing them without learning a generalizable representation
CStandard fine-tuning requires at least 100 examples per class to update weights meaningfully
DThe learning rate must be specially tuned for each new class, requiring a separate validation set
With only 5 examples, fine-tuning a large model has far more free parameters than data points — gradient descent will overfit by memorizing the specific examples rather than learning to generalize. Few-shot learning addresses this not by adjusting hyperparameters, but by changing what the model learns during training: a generalizable similarity metric or parameter initialization rather than class-specific boundaries.
Question 2 Multiple Choice
What is the fundamental difference between how prototypical networks and MAML handle a new task at test time?
APrototypical networks require fine-tuning via gradient descent; MAML classifies by nearest prototype without gradient updates
BPrototypical networks classify by distance to learned class prototypes without gradient updates; MAML performs a few gradient steps to adapt to the new task
CPrototypical networks use second-order optimization; MAML uses only first-order distance metrics
DBoth require the same number of gradient steps at test time; they differ only in training procedure
Prototypical networks are a metric learning approach: at test time, embed the support examples, compute class prototypes (mean embeddings), and classify by nearest prototype — no gradient updates needed. MAML meta-learns an initialization: at test time, take a few gradient steps on the support examples before classifying. This makes prototypical networks faster at inference, while MAML is more flexible but computationally expensive.
Question 3 True / False
In episodic training for few-shot learning, the model is trained on the same fixed set of classes it will be tested on, just with fewer labeled examples per class.
TTrue
FFalse
Answer: False
Episodic training explicitly uses different class subsets each episode so the model never sees test classes during training. If it trained on the same classes, it would memorize those classes rather than learning the general ability to classify new ones from few examples. By repeatedly solving N-way K-shot problems across different class subsets, the model learns transferable skills — how to quickly discriminate any new classes from limited evidence.
Question 4 True / False
Prototypical networks require no gradient updates at test-time inference because the embedding network has been trained to create a space where computing the mean of K support examples captures a class's identity sufficiently for classification.
TTrue
FFalse
Answer: True
Once the embedding function is trained, inference is just a forward pass plus nearest-prototype lookup. The training objective pushed same-class examples to cluster together and different-class examples to separate in embedding space. The prototype (mean embedding over K support examples) gives the 'center of mass' for a new class in that well-structured space, immediately enabling classification without any adaptation.
Question 5 Short Answer
What does it mean to say few-shot learning models 'learn how to learn,' and how does episodic training implement this goal?
Think about your answer, then reveal below.
Model answer: Rather than training to classify specific fixed classes, few-shot learning trains a model to solve the general problem of classifying previously unseen classes from minimal examples. Episodic training implements this by repeatedly sampling N-way K-shot problems from different class subsets. Across thousands of episodes, the model develops meta-level skills — either a well-structured embedding space (metric learning) or a parameter initialization that rapidly adapts to any new task (MAML) — that transfer to novel classification problems.
The training objective is reframed: standard supervised learning asks 'which fixed class does this belong to?'; few-shot learning asks 'given K examples of each of N new classes, which class does this query belong to?' Solving this across many different class subsets forces the model to develop generalizable representations rather than class-specific ones.