During MAML meta-training, what does the outer loop optimize?
AThe model's accuracy on the support set of each training task
BThe initialization such that a few inner-loop gradient steps yield strong query-set performance on new tasks
CThe learning rate used in the inner-loop adaptation steps
DThe average loss across all support sets without any inner-loop adaptation
MAML's outer loop optimizes the *initialization* — not task-specific accuracy — by evaluating how well the model performs on each task's query set *after* inner-loop adaptation. This requires differentiating through the inner-loop gradient steps (computing gradients of gradients). Options A and D miss the adaptation step entirely; option C conflates MAML with meta-learning approaches that learn a learning rate rather than an initialization.
Question 2 Multiple Choice
A team pre-trains a ResNet on ImageNet and then fine-tunes it on a medical imaging dataset. A colleague claims this is equivalent to MAML. What is the key difference?
AThere is no meaningful difference — both use a pre-trained initialization that is then adapted
BFine-tuning adapts to one fixed target domain; MAML explicitly optimizes the initialization so that adaptation to *any* new task is fast and effective
CFine-tuning uses support and query sets, while MAML uses a conventional train/test split
DMAML requires far less data than fine-tuning because it only needs a support set of a few examples per task
Standard fine-tuning adapts a model to one specific target domain; the ImageNet pre-training was optimized for ImageNet classification, not for the ease of subsequent fine-tuning. MAML explicitly meta-trains across many tasks to find an initialization that is optimized for *fast adaptation to any new task*. The meta-training objective is 'how quickly and effectively can you adapt?' — this is precisely what ordinary pre-training does not optimize.
Question 3 True / False
A MAML-trained model should already achieve high accuracy on a brand-new task before any inner-loop adaptation steps are taken.
TTrue
FFalse
Answer: False
False. MAML finds an initialization that is positioned in weight space to adapt quickly, not one that already solves new tasks. Before adaptation, a MAML model's accuracy on an unseen task is typically no better than a randomly initialized network's on that task. The value of the MAML initialization is revealed only after a small number of gradient steps on the support set, which rapidly closes the gap to strong performance.
Question 4 True / False
In meta-learning, both the inner loop and the outer loop are evaluated on data that the model has never seen during meta-training — this is what makes generalization possible.
TTrue
FFalse
Answer: True
True. The outer loop evaluates performance on each task's *query set* — data held out from the inner-loop adaptation — so the meta-learner is penalized if it only memorizes the support set rather than genuinely adapting. At test time, the meta-learner encounters entirely new tasks from the same distribution, relying on the learned adaptation strategy rather than any memorized patterns. This two-level held-out evaluation is what makes the generalization claim meaningful.
Question 5 Short Answer
What is MAML optimizing for, and how does this differ from what standard gradient descent optimizes when training a classifier?
Think about your answer, then reveal below.
Model answer: Standard gradient descent minimizes loss on a fixed dataset for one task — it optimizes task-specific performance. MAML optimizes the neural network's initial parameters so that after a small number of gradient steps on any new task's support set, performance on that task's query set is maximized. MAML is optimizing for adaptability — the quality of the starting position in weight space — not accuracy on any particular task.
The distinction is the level at which optimization operates. Conventional training asks 'how accurate are you on this task?' MAML asks 'how quickly and well do you adapt to new tasks?' This shifts the objective from task performance to learning efficiency, requiring the outer loop to backpropagate through the inner loop's gradient steps — a computationally heavier but qualitatively different objective.