← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Hierarchical Clustering

Graduate Depth 88 in the knowledge graph ☐ I know this ☆ Set as goal

2topics build on this

553prerequisites beneath it

See this on the map →

K-Means Clustering Metric Spaces: Definition and Examples +1 more→→DBSCAN Clustering

Core Idea

Hierarchical clustering builds a tree (dendrogram) of nested clusters using agglomerative (bottom-up, starting with individual points) or divisive (top-down) methods. Linkage criteria (single, complete, average, Ward) define inter-cluster distance; dendrograms allow analysis at multiple scales without fixing the number of clusters a priori.

How It's Best Learned

Perform hierarchical clustering on a dataset and visualize the dendrogram, then experiment with different linkage criteria to understand how they produce different clustering structures.

Explainer

From K-means, you know the basic clustering setup: group data points so that similar points end up together and dissimilar points end up apart. But K-means has a hard constraint — you must specify the number of clusters K in advance, and every point gets a flat assignment to exactly one cluster. Hierarchical clustering removes this limitation by producing a complete hierarchy of nested clusters, from individual points at the bottom to a single cluster containing everything at the top. You can then choose any level of granularity after the fact.

The most common approach is agglomerative (bottom-up) clustering. The algorithm starts with each data point as its own singleton cluster. At each step, it merges the two closest clusters into one, reducing the total number of clusters by one. This continues until all points belong to a single cluster. The result is a binary tree called a dendrogram, where the height of each merge indicates the distance at which those clusters were joined. To obtain a specific number of clusters, you simply cut the dendrogram at the desired height — a horizontal line through the tree. Points in the same subtree below the cut belong to the same cluster. This is powerful because you can explore multiple clustering solutions from a single computation by varying the cut height.

The definition of "distance between two clusters" is where things get interesting, and this is controlled by the linkage criterion. You know about distances between individual points from your metric spaces prerequisite — but once points are grouped into clusters, you need to define distance between sets of points. Single linkage uses the minimum distance between any pair of points in the two clusters; it tends to produce elongated, chain-like clusters and is sensitive to noise. Complete linkage uses the maximum distance, producing compact, roughly spherical clusters. Average linkage takes the mean of all pairwise distances, offering a balance. Ward's method merges the pair of clusters that produces the smallest increase in total within-cluster variance — it tends to produce the most evenly sized, compact clusters and is often the default choice for many applications.

The main tradeoff compared to K-means is computational cost. Naive agglomerative clustering requires computing and maintaining a distance matrix between all pairs of clusters, running in O(n³) time and O(n²) space. This makes it impractical for very large datasets — K-means at O(nKt) is far cheaper. However, hierarchical clustering offers things K-means cannot: a multi-scale view of cluster structure, no need to prespecify K, and the ability to capture non-spherical cluster shapes (especially with single linkage). In practice, the dendrogram itself is often the most valuable output, revealing the natural grouping structure of the data — whether there are two clear clusters, five, or a continuum with no sharp boundaries.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Literal Equations → Slope-Intercept Form → Point-Slope Form → Writing Linear Equations → Parallel and Perpendicular Line Slopes → Graphing Linear Equations → Piecewise Functions → One-Sided Limits → Continuity Definition → Limits and Continuity in Multiple Variables → Functions of Several Variables → Continuity in Multiple Variables → Partial Derivatives: Definition and Computation → Differentiability in Multiple Variables → Differentiability in Multivariable Functions → Total Differential and Linear Approximation → Chain Rule for Multivariable Functions → Implicit Differentiation → Related Rates → Optimization Problems → Critical Points of Multivariable Functions → Critical Points and Classification of Extrema → Second Partial Test for Local Extrema (Hessian) → The Hessian Matrix and Second Derivative Test → Unconstrained Optimization: Finding Extrema → Optimization in Multiple Variables → K-Means Clustering → Hierarchical Clustering

Longest path: 89 steps · 553 total prerequisite topics

Prerequisites (3)

K-Means Clusteringhard Vector Spacessoft Metric Spaces: Definition and Examplessoft

Leads To (1)

DBSCAN Clusteringsoft