A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Instance-Based Learning (k-NN)

Graduate Depth 83 in the knowledge graph ☐ I know this ☆ Set as goal

404prerequisites beneath it

Supervised Learning Fundamentals Algorithm Design Basics +1 more→

Core Idea

k-nearest neighbors classifies instances by finding the k most similar neighbors in training data and using their labels (majority vote for classification, average for regression). It is a lazy learner (no training phase), making it sensitive to feature scaling and slow at prediction time, but it performs well with complex local patterns and requires no assumptions about data distribution.

How It's Best Learned

Implement k-NN and experiment with k values and distance metrics (Euclidean, Manhattan, cosine) on datasets with different geometry.

Explainer

Most supervised learning algorithms you have encountered so far follow a two-phase pattern: first learn a model from training data, then use that model to make predictions. k-nearest neighbors (k-NN) skips the first phase entirely. Instead of compressing training data into parameters or decision boundaries, it simply stores every training example and defers all computation to prediction time. When a new instance arrives, k-NN finds the k training examples closest to it, polls their labels, and returns the majority vote for classification or the average for regression. This makes k-NN a lazy learner — it does no work until someone asks a question.

The "nearest" in k-nearest neighbors depends on a distance metric, and the choice of metric shapes everything about the algorithm's behavior. Euclidean distance treats feature space like physical space and works well when features are on similar scales. Manhattan distance sums absolute differences along each axis, making it more robust to outliers in individual dimensions. Cosine similarity measures the angle between feature vectors rather than their magnitude, which is useful when you care about proportions rather than absolute values (as in text data). Because k-NN relies directly on distances, feature scaling is critical — a feature measured in thousands will dominate one measured in decimals unless you normalize first. This is a direct consequence of working in vector spaces: the geometry of your feature space determines what "similar" means.

The parameter k controls the tradeoff between sensitivity and stability. With k=1, the algorithm simply copies the label of the single nearest neighbor, which captures fine-grained local patterns but is extremely sensitive to noise — one mislabeled training point changes the prediction. As k increases, predictions smooth out because more neighbors vote, but the algorithm loses the ability to capture tight local structure. A useful mental model: k=1 draws a complex, jagged decision boundary that perfectly memorizes the training set, while large k draws a smoother boundary that generalizes better but may miss genuine local patterns. Cross-validation on your data tells you where the sweet spot lies.

The major practical limitation of k-NN is computational cost at prediction time. Every prediction requires computing distances to all training examples, which scales linearly with the training set size. For small datasets this is fine, but for millions of examples it becomes prohibitive. Data structures like KD-trees and ball trees accelerate nearest-neighbor search by partitioning the feature space, reducing average lookup time from linear to logarithmic in favorable conditions. However, these structures lose their advantage in high-dimensional spaces — a phenomenon related to the curse of dimensionality, where distances between points become increasingly uniform and less informative as dimensions grow. Despite these limitations, k-NN remains a powerful baseline: it makes no assumptions about the shape of decision boundaries, adapts to arbitrarily complex local patterns, and is trivially easy to update with new data.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Conditional Statements → Defining and Calling Functions → Functions: Decomposing Problems → Function Parameters and Argument Passing → Return Values → Variable Scope → Introduction to Classes → Objects and Instances → Methods and Attributes → Algorithm Design Basics → Supervised Learning Fundamentals → Instance-Based Learning (k-NN)

Longest path: 84 steps · 404 total prerequisite topics

Prerequisites (3)

Supervised Learning Fundamentalshard Algorithm Design Basicssoft Vector Spacessoft

Leads To (0)

No topics depend on this one yet.