K-Means is run twice on the same dataset with different random initializations and produces two different final cluster assignments. What does this most likely indicate?
AThe data contains no meaningful cluster structure, so K-Means returns arbitrary results
BK-Means has converged to different local minima of the within-cluster sum of squared distances
CThe value of k is incorrect and should be adjusted
DK-Means always produces random output and cannot be trusted for analysis
K-Means minimizes a non-convex objective (total within-cluster inertia) that can have multiple local minima. Different initializations can lead the iterative algorithm to converge at different local optima — each is a valid solution to the algorithm, but none is guaranteed to be the global minimum. This sensitivity to initialization is why K-Means++ initializes centroids spread far apart to improve consistency, and why running K-Means multiple times and keeping the best result is standard practice.
Question 2 Multiple Choice
A dataset contains two elongated, crescent-shaped clusters that curve around each other. K-Means with k=2 consistently fails to separate them, instead splitting each crescent roughly in half. What is the fundamental reason for this failure?
AK-Means needs more iterations to discover curved cluster boundaries
BThe features require standardization before K-Means can handle non-circular shapes
CK-Means assigns points to the nearest centroid using Euclidean distance, implicitly assuming spherical clusters — curved or irregular shapes violate this assumption
Dk=2 is too small; increasing k would resolve the problem
K-Means defines clusters by proximity to centroids using Euclidean distance, which creates spherical (Voronoi) decision boundaries. Any cluster shape that is not roughly spherical and convex will be poorly captured — elongated, crescent, ring, or interleaved clusters all violate the assumption. This is a fundamental geometric limitation, not a parameter-tuning issue. Algorithms like DBSCAN or spectral clustering handle arbitrary shapes by defining clusters through local density or connectivity rather than centroid distance.
Question 3 True / False
Each iteration of the K-Means algorithm is guaranteed to reduce or maintain the total within-cluster sum of squared distances.
TTrue
FFalse
Answer: True
Both the assignment step and the update step are non-increasing with respect to inertia. The assignment step reduces inertia (or keeps it the same) by reassigning each point to its nearest centroid. The update step moves each centroid to the mean of its assigned points, which by definition minimizes the sum of squared distances from those points to any single location. Because inertia is bounded below by zero and decreases monotonically, convergence is guaranteed — though the solution may be a local minimum.
Question 4 True / False
The elbow method identifies the optimal value of k by selecting the k that produces the lowest inertia.
TTrue
FFalse
Answer: False
Inertia always decreases as k increases — at k=n (one cluster per point), inertia is exactly zero. Selecting the minimum inertia would always yield k=n, which is meaningless. The elbow method instead looks for the point of diminishing returns: the 'elbow' where each additional cluster provides a much smaller reduction in inertia than before. Beyond the true number of clusters, additional centroids merely subdivide genuine clusters, yielding only marginal improvement. In practice, the elbow is often ambiguous, which is why silhouette scores provide a complementary criterion.
Question 5 Short Answer
Why does K-Means++ improve on random initialization, and what property of the initial centroids does it aim to achieve?
Think about your answer, then reveal below.
Model answer: K-Means++ selects initial centroids that are spread far apart rather than placed randomly. It chooses the first centroid uniformly at random, then selects each subsequent centroid with probability proportional to its squared distance from the nearest already-chosen centroid. This ensures the initial centroids broadly cover the data space and are unlikely to cluster inside the same true group. The goal is to avoid degenerate starting configurations — such as multiple initial centroids inside one true cluster — that force K-Means to waste iterations recovering and risk converging to a poor local minimum.
K-Means++ provides a theoretical guarantee: in expectation, its initial cost is within O(log k) of the optimal cost, whereas random initialization offers no such guarantee. In practice it dramatically reduces both variance across runs and the number of iterations needed to converge, making it the default initialization strategy in most K-Means implementations.