A dataset consists of two interlocking crescent shapes. You run both k-means (k=2) and DBSCAN on it. What do you expect?
ABoth algorithms correctly identify the two crescents as separate clusters
BK-means succeeds because it finds the two natural groups; DBSCAN fails because it cannot handle curved shapes
CDBSCAN correctly identifies the two crescents; k-means fails because it assumes spherical clusters and cannot separate interlocking shapes
DNeither algorithm can handle this dataset without feature engineering
K-means partitions space into Voronoi regions around centroids, which are always convex and roughly spherical — it literally cannot draw the boundary between two interlocked crescents. DBSCAN traces density-connectivity, following the dense curve of each crescent regardless of its shape. Points in the sparse gap become noise or border points, and the two dense curves become two distinct clusters. This is DBSCAN's core advantage: it discovers clusters of arbitrary shape by following the data's density structure rather than imposing a geometric assumption.
Question 2 Multiple Choice
You run DBSCAN on a dataset and nearly every point is labeled as noise. What is the most likely cause?
Aeps is too large, causing all points to merge into one cluster
Bmin_pts is set to 1, making every point its own cluster
Ceps is too small, so most points don't have enough neighbors within the radius to qualify as core points
DThe dataset has too many dimensions for DBSCAN to function
When eps is too small, almost no point has min_pts neighbors within that tiny radius — so almost no core points exist. Without core points, there are no clusters, and nearly all points are labeled noise. The fix is to increase eps. A useful diagnostic is the k-distance plot: sort points by their distance to the k-th nearest neighbor and look for a natural elbow — the eps value at the elbow typically captures meaningful cluster density. Option A describes the opposite problem (eps too large → clusters merge), not the all-noise scenario.
Question 3 True / False
DBSCAN can identify clusters of arbitrary shape because it defines clusters based on density-connectivity rather than distance to a cluster centroid.
TTrue
FFalse
Answer: True
This is the fundamental distinction between DBSCAN and centroid-based methods like k-means. By chaining core points together (each core point includes all points within eps, and any core point in that neighborhood extends the cluster further), DBSCAN traces the shape of dense regions regardless of their geometry. A ring, crescent, or elongated blob are all discovered correctly. A centroid-based method cannot do this because the centroid of a crescent-shaped cluster would lie in the empty interior, and the Voronoi boundary between two centroids would cut through the crescents rather than between them.
Question 4 True / False
In DBSCAN, nearly every data point is assigned to exactly one cluster — points that don't fit well are assigned to the nearest cluster as border points.
TTrue
FFalse
Answer: False
This is a key difference from k-means. DBSCAN explicitly designates low-density points as noise — they are not assigned to any cluster. A noise point is one that is neither a core point (doesn't have min_pts neighbors within eps) nor a border point (not within eps of any core point). This ability to leave points unassigned is one of DBSCAN's strengths for outlier detection. Border points are assigned to a cluster, but only because they are within eps of a core point — they are on the edge of a dense region, not isolated outliers.
Question 5 Short Answer
Why does DBSCAN require two parameters (eps and min_pts) rather than one, and what aspect of cluster structure does each control?
Think about your answer, then reveal below.
Model answer: eps defines the neighborhood radius — the spatial scale at which 'nearby' is measured. min_pts sets the minimum density required for a region to be considered a cluster core. Together they define what counts as a dense region: a point must have at least min_pts neighbors within distance eps to be a core point. eps alone cannot distinguish signal from noise without a density threshold, and min_pts alone means nothing without a distance scale. Setting eps too small labels everything noise; too large merges distinct clusters. Setting min_pts too low makes every isolated point a cluster; too high misses real low-density clusters.
The two parameters are jointly necessary because density has two independent dimensions: spatial extent (how far you look) and count threshold (how many you require). Real data requires both to be calibrated together — a common heuristic is to set min_pts ≥ dimensionality + 1, then use the k-distance plot to find a natural eps. This interdependence is also why DBSCAN struggles with datasets having clusters of widely varying densities: a single (eps, min_pts) pair cannot simultaneously capture both dense and sparse clusters.