A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

K-Means Clustering

Graduate Depth 87 in the knowledge graph ☐ I know this ☆ Set as goal

4topics build on this

551prerequisites beneath it

Algorithm Design Basics Metric Spaces: Definition and Examples +3 more→→DBSCAN Clustering Hierarchical Clustering +1 more

Core Idea

K-Means partitions data into k clusters by iteratively assigning points to nearest centroids and updating centroids. Fast and scalable but sensitive to initialization and assumes spherical clusters. Selecting k requires elbow method or silhouette scores.

Explainer

Imagine you have a room full of unlabeled data points scattered across a space — customer purchase histories, sensor readings, or pixel colors in an image — and you want to discover natural groupings. K-Means clustering is an unsupervised algorithm that partitions these points into exactly *k* groups, where each group is defined by its center of mass, called a centroid. Unlike supervised learning where labels guide the model, K-Means finds structure on its own by exploiting the distance relationships you already understand from working with vectors in Rⁿ and metric spaces.

The algorithm follows a beautifully simple two-step loop. First, assign every data point to the nearest centroid using a distance metric (typically Euclidean distance). Second, update each centroid by computing the mean position of all points assigned to it. These two steps repeat until the assignments stop changing — that is, the algorithm has converged. You can think of it as an optimization problem: K-Means minimizes the total within-cluster sum of squared distances (the "inertia"), which connects directly to the optimization concepts you have studied. Each iteration is guaranteed to reduce or maintain this objective, so the algorithm always terminates.

The critical design choice is the value of *k* — how many clusters to look for. Since K-Means does not determine this automatically, you need heuristics. The elbow method runs K-Means for several values of *k*, plots inertia against *k*, and looks for the "elbow" where adding more clusters yields diminishing returns. Silhouette scores measure how similar each point is to its own cluster versus the nearest neighboring cluster, giving a more nuanced quality measure. Neither method is foolproof, but together they provide reasonable guidance.

K-Means has important limitations worth understanding upfront. Because it uses Euclidean distance and computes means, it implicitly assumes clusters are roughly spherical and equally sized — elongated, irregular, or overlapping clusters will be poorly captured. The algorithm is also sensitive to initialization: different random starting centroids can produce different final clusterings. The widely used K-Means++ initialization selects initial centroids that are spread apart, dramatically improving consistency. Despite these limitations, K-Means remains one of the most widely used clustering algorithms because it scales efficiently to large datasets — each iteration is O(n·k·d) for n points in d dimensions — and its simplicity makes it an excellent first tool for exploratory data analysis before moving to more complex methods.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Literal Equations → Slope-Intercept Form → Point-Slope Form → Writing Linear Equations → Parallel and Perpendicular Line Slopes → Graphing Linear Equations → Piecewise Functions → One-Sided Limits → Continuity Definition → Limits and Continuity in Multiple Variables → Functions of Several Variables → Continuity in Multiple Variables → Partial Derivatives: Definition and Computation → Differentiability in Multiple Variables → Differentiability in Multivariable Functions → Total Differential and Linear Approximation → Chain Rule for Multivariable Functions → Implicit Differentiation → Related Rates → Optimization Problems → Critical Points of Multivariable Functions → Critical Points and Classification of Extrema → Second Partial Test for Local Extrema (Hessian) → The Hessian Matrix and Second Derivative Test → Unconstrained Optimization: Finding Extrema → Optimization in Multiple Variables → K-Means Clustering

Longest path: 88 steps · 551 total prerequisite topics

Prerequisites (5)

Algorithm Design Basicssoft Vectors in R^nsoft Optimization in Multiple Variablessoft Optimization Problemssoft Metric Spaces: Definition and Examplessoft

Leads To (3)

DBSCAN Clusteringhard Hierarchical Clusteringhard Mixture Models and Gaussian Mixture Modelshard