K-Means Clustering

Graduate Depth 72 in the knowledge graph I know this Set as goal
Unlocks 4 downstream topics
unsupervised-learning clustering partitioning

Core Idea

K-Means partitions data into k clusters by iteratively assigning points to nearest centroids and updating centroids. Fast and scalable but sensitive to initialization and assumes spherical clusters. Selecting k requires elbow method or silhouette scores.

Explainer

Imagine you have a room full of unlabeled data points scattered across a space — customer purchase histories, sensor readings, or pixel colors in an image — and you want to discover natural groupings. K-Means clustering is an unsupervised algorithm that partitions these points into exactly *k* groups, where each group is defined by its center of mass, called a centroid. Unlike supervised learning where labels guide the model, K-Means finds structure on its own by exploiting the distance relationships you already understand from working with vectors in R^n and metric spaces.

The algorithm follows a beautifully simple two-step loop. First, assign every data point to the nearest centroid using a distance metric (typically Euclidean distance). Second, update each centroid by computing the mean position of all points assigned to it. These two steps repeat until the assignments stop changing — that is, the algorithm has converged. You can think of it as an optimization problem: K-Means minimizes the total within-cluster sum of squared distances (the "inertia"), which connects directly to the optimization concepts you have studied. Each iteration is guaranteed to reduce or maintain this objective, so the algorithm always terminates.

The critical design choice is the value of *k* — how many clusters to look for. Since K-Means does not determine this automatically, you need heuristics. The elbow method runs K-Means for several values of *k*, plots inertia against *k*, and looks for the "elbow" where adding more clusters yields diminishing returns. Silhouette scores measure how similar each point is to its own cluster versus the nearest neighboring cluster, giving a more nuanced quality measure. Neither method is foolproof, but together they provide reasonable guidance.

K-Means has important limitations worth understanding upfront. Because it uses Euclidean distance and computes means, it implicitly assumes clusters are roughly spherical and equally sized — elongated, irregular, or overlapping clusters will be poorly captured. The algorithm is also sensitive to initialization: different random starting centroids can produce different final clusterings. The widely used K-Means++ initialization selects initial centroids that are spread apart, dramatically improving consistency. Despite these limitations, K-Means remains one of the most widely used clustering algorithms because it scales efficiently to large datasets — each iteration is O(n·k·d) for n points in d dimensions — and its simplicity makes it an excellent first tool for exploratory data analysis before moving to more complex methods.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsOne-Sided LimitsContinuity DefinitionLimit Definition of the DerivativePower RuleConstant Multiple and Sum/Difference RulesProduct RuleChain RuleHigher-Order DerivativesConcavity and Inflection PointsSecond Derivative TestCurve SketchingOptimization ProblemsCritical Points of Multivariable FunctionsCritical Points and Classification of ExtremaSecond Partial Test for Local Extrema (Hessian)The Hessian Matrix and Second Derivative TestUnconstrained Optimization: Finding ExtremaOptimization in Multiple VariablesK-Means Clustering

Longest path: 73 steps · 402 total prerequisite topics

Prerequisites (5)

Leads To (3)