← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Zero-Shot Learning

Research Depth 98 in the knowledge graph ☐ I know this ☆ Set as goal

2topics build on this

689prerequisites beneath it

See this on the map →

Word Embeddings and Representations Transfer Learning in Neural Networks→→Few-Shot Learning

Core Idea

Zero-shot learning classifies unseen classes by leveraging semantic embeddings or attribute descriptions shared across seen and unseen classes. A model trained on seen classes transfers knowledge to unseen classes through semantic space. This enables generalization beyond training classes without task-specific fine-tuning.

Explainer

Standard classification assumes that every class the model will encounter at test time was present during training. But consider an image classifier trained on 1,000 animal species that encounters a photograph of an okapi — a species it has never seen. A conventional classifier has no output node for "okapi" and must fail. Zero-shot learning solves this by never classifying into fixed output slots. Instead, it learns to map inputs into a shared semantic space where both seen and unseen classes have representations, then classifies by finding the nearest class representation in that space.

The key ingredient is the semantic embedding of classes, which you know from your study of word embeddings. Each class is represented not by an arbitrary integer label but by a rich vector — typically a word embedding of the class name, or a vector of human-defined attributes (has stripes, is tall, is herbivorous). During training, the model learns to project input features (image pixels, text tokens) into this same semantic space so that images of zebras land near the "zebra" embedding. At test time, the model projects the okapi image into semantic space and finds that it is closest to the "okapi" class embedding — even though no okapi image was ever used in training. The model succeeds because "okapi" has a meaningful position in semantic space (near "giraffe" and "deer") that captures its visual properties.

Two main approaches dominate. Attribute-based methods define each class by a binary or continuous attribute vector — for animals, attributes might include "has fur," "has hooves," "is domesticated." The model learns to predict attributes from inputs, then matches predicted attributes to class attribute vectors. Embedding-based methods use pre-trained word vectors or sentence embeddings as class representations and learn a compatibility function between input features and class embeddings. The embedding approach is more scalable since it requires no manual attribute annotation, and it benefits directly from the structure that word embeddings capture — semantically similar classes have similar embeddings, so knowledge about horses transfers naturally to zebras.

A critical challenge is the hubness problem and domain shift. In high-dimensional spaces, some points (hubs) tend to be nearest neighbors of many other points, causing certain classes to be predicted far too often. Domain shift occurs because the model's projection function was optimized on seen classes and may not generalize well to unseen ones. Generalized zero-shot learning addresses an even harder setting where test examples may come from either seen or unseen classes, requiring the model to avoid the temptation of always predicting a familiar seen class. Solutions include calibration techniques and transductive methods that use unlabeled test data to adapt the projection. Zero-shot learning connects naturally to the broader transfer learning paradigm: instead of transferring learned features across tasks, it transfers semantic structure across classes.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Linear Regression in Machine Learning → Neural Network Fundamentals → Backpropagation Algorithm → Multilayer Perceptrons (MLPs) → Activation Functions in Neural Networks → Vanishing Gradient Problem → Gradient Descent and Optimization → Transfer Learning in Neural Networks → Zero-Shot Learning

Longest path: 99 steps · 689 total prerequisite topics

Prerequisites (2)

Word Embeddings and Representationshard Transfer Learning in Neural Networkssoft

Leads To (1)

Few-Shot Learningsoft