← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Few-Shot Learning

Research Depth 99 in the knowledge graph ☐ I know this ☆ Set as goal

1topic build on this

690prerequisites beneath it

See this on the map →

Transfer Learning in Neural Networks Zero-Shot Learning→→Meta-Learning (Learning to Learn)

Core Idea

Few-shot learning enables models to learn new classes from very few examples (1-shot, 5-shot) by leveraging prior knowledge. Metric learning approaches learn similarity functions; model-agnostic meta-learning discovers good initializations. Prototypical networks classify based on distances to learned class prototypes in embedding space.

Explainer

From your study of transfer learning, you know that a model trained on one task can be adapted to a new task by reusing learned representations — typically by fine-tuning a pretrained network on new labeled data. But what if you have only one or five examples of each new class? Standard fine-tuning on so little data will catastrophically overfit. Few-shot learning addresses this extreme low-data regime by training models that are explicitly designed to generalize from minimal examples, typically framed as N-way K-shot problems: classify among N new classes given only K labeled examples per class.

The training paradigm is fundamentally different from standard supervised learning. Instead of training on a fixed set of classes, few-shot learning uses episodic training: each training episode samples a small subset of classes and a handful of examples per class, mimicking the few-shot scenario the model will face at test time. The model learns not to classify specific classes, but to *learn how to classify* — a form of meta-learning (learning to learn). Over thousands of episodes with different class subsets, the model develops general-purpose abilities for rapid adaptation.

The two dominant approaches differ in what they meta-learn. Metric learning methods learn an embedding function that maps examples into a space where same-class examples cluster together and different-class examples are far apart. Prototypical networks are the clearest example: embed all K support examples for each class, compute the mean embedding (the prototype) for each class, and classify a new query by finding the nearest prototype. The training objective simply pushes the embedding network to create clusters that are tight within each class and well-separated between classes. Siamese networks take a pairwise approach, learning to predict whether two examples belong to the same class. These methods are elegant because at test time, they require no gradient updates — just a forward pass and a distance computation.

Model-Agnostic Meta-Learning (MAML) takes the alternative approach of meta-learning an initialization. The idea is to find a set of network parameters that, when fine-tuned with just a few gradient steps on K examples of new classes, rapidly achieves good performance. MAML trains by simulating this inner fine-tuning loop across many episodes and optimizing the initial parameters so that the post-fine-tuning performance is maximized. This requires computing gradients through gradients (second-order optimization), which is computationally expensive but remarkably flexible — it works with any model architecture and any differentiable loss. The intuition is that MAML finds a point in parameter space that is close to good solutions for many tasks simultaneously, so a few steps of gradient descent on any specific task lands in the right neighborhood.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Linear Regression in Machine Learning → Neural Network Fundamentals → Backpropagation Algorithm → Multilayer Perceptrons (MLPs) → Activation Functions in Neural Networks → Vanishing Gradient Problem → Gradient Descent and Optimization → Transfer Learning in Neural Networks → Zero-Shot Learning → Few-Shot Learning

Longest path: 100 steps · 690 total prerequisite topics

Prerequisites (2)

Transfer Learning in Neural Networkshard Zero-Shot Learningsoft

Leads To (1)

Meta-Learning (Learning to Learn)hard