Few-Shot Learning

Research Depth 71 in the knowledge graph I know this Set as goal
Unlocks 1 downstream topic
few-shot low-data rapid-adaptation

Core Idea

Few-shot learning enables models to learn new classes from very few examples (1-shot, 5-shot) by leveraging prior knowledge. Metric learning approaches learn similarity functions; model-agnostic meta-learning discovers good initializations. Prototypical networks classify based on distances to learned class prototypes in embedding space.

Explainer

From your study of transfer learning, you know that a model trained on one task can be adapted to a new task by reusing learned representations — typically by fine-tuning a pretrained network on new labeled data. But what if you have only one or five examples of each new class? Standard fine-tuning on so little data will catastrophically overfit. Few-shot learning addresses this extreme low-data regime by training models that are explicitly designed to generalize from minimal examples, typically framed as N-way K-shot problems: classify among N new classes given only K labeled examples per class.

The training paradigm is fundamentally different from standard supervised learning. Instead of training on a fixed set of classes, few-shot learning uses episodic training: each training episode samples a small subset of classes and a handful of examples per class, mimicking the few-shot scenario the model will face at test time. The model learns not to classify specific classes, but to *learn how to classify* — a form of meta-learning (learning to learn). Over thousands of episodes with different class subsets, the model develops general-purpose abilities for rapid adaptation.

The two dominant approaches differ in what they meta-learn. Metric learning methods learn an embedding function that maps examples into a space where same-class examples cluster together and different-class examples are far apart. Prototypical networks are the clearest example: embed all K support examples for each class, compute the mean embedding (the prototype) for each class, and classify a new query by finding the nearest prototype. The training objective simply pushes the embedding network to create clusters that are tight within each class and well-separated between classes. Siamese networks take a pairwise approach, learning to predict whether two examples belong to the same class. These methods are elegant because at test time, they require no gradient updates — just a forward pass and a distance computation.

Model-Agnostic Meta-Learning (MAML) takes the alternative approach of meta-learning an initialization. The idea is to find a set of network parameters that, when fine-tuned with just a few gradient steps on K examples of new classes, rapidly achieves good performance. MAML trains by simulating this inner fine-tuning loop across many episodes and optimizing the initial parameters so that the post-fine-tuning performance is maximized. This requires computing gradients through gradients (second-order optimization), which is computationally expensive but remarkably flexible — it works with any model architecture and any differentiable loss. The intuition is that MAML finds a point in parameter space that is close to good solutions for many tasks simultaneously, so a few steps of gradient descent on any specific task lands in the right neighborhood.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsGeometric Sequences and SeriesSigma NotationExpected ValueLinear Regression in Machine LearningNeural Network FundamentalsBackpropagation AlgorithmMultilayer Perceptrons (MLPs)Activation Functions in Neural NetworksConvolutional Neural NetworksTransfer Learning in Neural NetworksZero-Shot LearningFew-Shot Learning

Longest path: 72 steps · 479 total prerequisite topics

Prerequisites (2)

Leads To (1)