A data scientist wants clusters that are compact and roughly spherical. Which linkage criterion should they prefer for agglomerative hierarchical clustering?
ASingle linkage, because it minimizes the distance between the nearest points in each cluster pair
BComplete linkage, because using the maximum pairwise distance to define inter-cluster distance tends to produce compact, bounded clusters
CAny linkage criterion produces identical cluster shapes for spherical data
DWard's method cannot be used here because it assumes clusters are normally distributed
Single linkage uses the closest pair of points between clusters, which causes 'chaining' — elongated, snake-like clusters that keep merging because a single nearby point bridges them. Complete linkage requires that the farthest points in two clusters are close before merging, which compresses clusters into compact forms. Ward's method (minimizing within-cluster variance increase) also tends to produce compact, evenly sized clusters and is often the best default. Single linkage is appropriate for detecting elongated or irregularly shaped clusters, not compact spherical ones.
Question 2 Multiple Choice
A colleague claims: 'Hierarchical clustering is strictly better than K-means because it automatically determines the correct number of clusters.' What is the most accurate assessment of this claim?
AFully correct — hierarchical clustering objectively determines K from the dendrogram structure
BPartially correct — hierarchical clustering avoids specifying K during computation, but a human must still choose where to cut the dendrogram, which is its own subjective decision
CIncorrect — both methods require you to specify K before running
DIncorrect — K-means also builds a hierarchical structure if you run it with multiple values of K
Hierarchical clustering builds a complete tree so you don't need to commit to K before running the algorithm. But reading the 'right' K from a dendrogram still requires human judgment: you look for large gaps between merge heights and decide where to draw a horizontal cut. In practice, different analysts may choose different cuts from the same dendrogram. The advantage over K-means is that you can explore many values of K from a single computation, not that K is determined objectively.
Question 3 True / False
Hierarchical clustering can reveal whether a dataset has two clear clusters, five, or a continuum — information that a flat K-means analysis cannot easily provide from a single run.
TTrue
FFalse
Answer: True
This is the key qualitative advantage of hierarchical clustering. A dendrogram visualizes the full nested structure: large vertical gaps between merges indicate natural cluster boundaries at that scale; uniformly small gaps suggest a continuum with no clear structure. A K-means analysis with a fixed K gives you K clusters but tells you nothing about what happens at other values of K. To get equivalent information from K-means, you would need to run it many times with different K values and compare.
Question 4 True / False
Single linkage hierarchical clustering produces the most compact, evenly sized clusters and is the best default choice for general-purpose clustering tasks.
TTrue
FFalse
Answer: False
Single linkage is notoriously prone to 'chaining': because it merges clusters whenever any single pair of points is close, it tends to produce elongated clusters that spread across long chains of nearby points rather than compact groups. Ward's method — which merges the pair of clusters whose union minimizes the increase in total within-cluster variance — typically produces the most compact, evenly sized clusters and is usually the recommended default. Single linkage has specific use cases (detecting elongated shapes, finding the minimum spanning tree) but is not a general-purpose default.
Question 5 Short Answer
You have a dataset of 50,000 observations and want to explore its cluster structure. Why might agglomerative hierarchical clustering be impractical, and what would you do instead?
Think about your answer, then reveal below.
Model answer: Agglomerative hierarchical clustering requires maintaining and updating a pairwise distance matrix, which runs in O(n³) time and O(n²) space. For n = 50,000, that means tracking 2.5 billion pairwise distances — likely exceeding available memory and taking prohibitively long. A practical alternative would be to first cluster with K-means (O(nKt), far cheaper) to obtain a manageable number of representative centroids, then run hierarchical clustering on those centroids to explore multi-scale structure. Alternatively, approximate methods or subsampling strategies can bring hierarchical clustering within reach.
The O(n³)/O(n²) cost is the main practical limitation of hierarchical clustering. The tradeoff with K-means is real: K-means scales to millions of points but requires pre-specifying K and finds only flat partitions. Hierarchical clustering is most valuable when n is small enough to afford it — typically thousands of points, not hundreds of thousands.