A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Meta-Learning (Learning to Learn)

Research Depth 100 in the knowledge graph ☐ I know this ☆ Set as goal

691prerequisites beneath it

Few-Shot Learning Neural Network Fundamentals→

Core Idea

Meta-learning trains models to learn quickly from few examples by optimizing for rapid task adaptation. Algorithms like MAML (Model-Agnostic Meta-Learning) discover initializations amenable to fine-tuning on new tasks with minimal gradient steps. This mimics human learning by leveraging prior knowledge for generalization.

Explainer

Standard neural network training optimizes a model for one specific task: classify these images, predict these labels, generate these outputs. But consider how humans learn. After learning to identify dogs, cats, and birds, you can recognize a new animal species from just a few examples — you have learned *how to learn* visual categories, not just the categories themselves. Meta-learning formalizes this idea: instead of training a model to solve one task, you train it across many tasks so that it becomes good at adapting to new ones quickly.

The setup requires rethinking what "training data" means. In conventional supervised learning, your dataset is a collection of labeled examples for a single task. In meta-learning, your dataset is a collection of *tasks*, each containing its own small training set (the support set) and test set (the query set). During meta-training, the model repeatedly receives a new task, adapts to its support set, and is evaluated on its query set. The meta-learner's parameters are updated based on how well it performed *after* adaptation — optimizing not for any single task's accuracy but for the ability to adapt rapidly.

MAML (Model-Agnostic Meta-Learning) is the most influential approach and illustrates the core idea cleanly. MAML finds an initialization of the neural network weights such that a few gradient descent steps on a new task's support set produce strong performance on its query set. Think of it as finding a point in weight space that is equidistant from the optimal solutions of many different tasks — a "good starting position" from which any specific task is only a short walk away. The outer loop optimizes this initialization by computing gradients *through* the inner adaptation steps, which requires second-order derivatives (gradients of gradients).

Beyond MAML, other meta-learning paradigms take different approaches. Metric-based methods like Prototypical Networks learn an embedding space where examples from the same class cluster together, making classification a nearest-neighbor problem in that space. Black-box methods use a recurrent or attention-based network that takes the support set as input and directly outputs predictions, treating the entire adaptation process as a forward pass rather than explicit gradient steps. Each paradigm makes different tradeoffs between flexibility, computational cost, and the assumptions imposed on what "adaptation" means. What unifies them is the two-level structure: an inner loop that adapts to specific tasks and an outer loop that improves the adaptation process itself.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Linear Regression in Machine Learning → Neural Network Fundamentals → Backpropagation Algorithm → Multilayer Perceptrons (MLPs) → Activation Functions in Neural Networks → Vanishing Gradient Problem → Gradient Descent and Optimization → Transfer Learning in Neural Networks → Zero-Shot Learning → Few-Shot Learning → Meta-Learning (Learning to Learn)

Longest path: 101 steps · 691 total prerequisite topics

Prerequisites (2)

Few-Shot Learninghard Neural Network Fundamentalshard

Leads To (0)

No topics depend on this one yet.