← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Semi-Supervised Learning

Graduate Depth 83 in the knowledge graph ☐ I know this ☆ Set as goal

2topics build on this

399prerequisites beneath it

See this on the map →

Supervised Learning Fundamentals→→Self-Supervised Learning

Core Idea

Semi-supervised learning leverages both labeled and abundant unlabeled data. Techniques include self-training (pseudo-labeling unlabeled data), consistency regularization (enforcing prediction invariance under perturbations), and co-training (multiple models train each other). This practical approach handles scenarios where labeling is expensive but unlabeled data is plentiful.

Explainer

In supervised learning, every training example comes with a label, and the model learns the mapping from inputs to outputs. But labeling data is often expensive — a radiologist must examine each X-ray, a linguist must annotate each sentence, a human must categorize each support ticket. Meanwhile, *unlabeled* data is cheap and abundant: the internet is full of images, text, and recordings that nobody has annotated. Semi-supervised learning bridges this gap by using a small set of labeled examples together with a large pool of unlabeled examples, extracting structural information from the unlabeled data to improve predictions.

The simplest semi-supervised technique is self-training (also called pseudo-labeling). You train a supervised model on your labeled data, use it to predict labels for the unlabeled data, then add the most confident predictions to your training set and retrain. This bootstrapping process iteratively expands the labeled pool. The risk is obvious: if the initial model makes confident but wrong predictions, those errors propagate and compound. Self-training works best when the initial model is reasonably accurate and the confidence threshold for accepting pseudo-labels is set high enough to filter out mistakes.

Consistency regularization takes a more principled approach based on a smoothness assumption: if two inputs are similar, their predictions should also be similar. The model is shown an unlabeled example and a perturbed version of that same example (with noise, data augmentation, or dropout), and the loss penalizes any difference between the two predictions. This forces the decision boundary away from dense regions of the input space, pushing it into low-density gaps between clusters — which is where you want it. FixMatch, a widely used method, combines pseudo-labeling with consistency regularization: it generates a pseudo-label from a weakly augmented view of an unlabeled example, then trains the model to predict that label from a strongly augmented view, only keeping examples where the weak-augmentation prediction exceeds a confidence threshold.

Co-training uses a different strategy: train two models on different "views" of the data (different feature subsets or different architectures) and have each model label unlabeled examples for the other. Because the models have different inductive biases, they tend to make different mistakes — so one model's confident predictions on examples the other finds ambiguous provide genuinely informative training signal. The key assumption underlying all semi-supervised methods is the cluster assumption: that data points in the same cluster in feature space tend to share a label. When this assumption holds, unlabeled data reveals the cluster structure, and even a few labeled points per cluster are enough to assign labels to the rest. When the assumption fails — when class boundaries run through the middle of dense clusters — semi-supervised methods can actually hurt performance compared to supervised learning on the labeled data alone.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Conditional Statements → Defining and Calling Functions → Functions: Decomposing Problems → Function Parameters and Argument Passing → Return Values → Variable Scope → Introduction to Classes → Objects and Instances → Methods and Attributes → Algorithm Design Basics → Supervised Learning Fundamentals → Semi-Supervised Learning

Longest path: 84 steps · 399 total prerequisite topics

Prerequisites (1)

Supervised Learning Fundamentalshard

Leads To (1)

Self-Supervised Learningsoft