A team trains a k-nearest neighbors classifier on 500 medical records with 5 features, achieving 82% test accuracy. They add 300 additional measured features to the same 500 records and retrain, achieving 99% training accuracy but 68% test accuracy. What is the most likely explanation?
AThe 300 new features are noisy and should have been normalized before training
Bk-NN requires recalibrating k when the number of features changes, and the team used the wrong k
CIn high-dimensional space the data is too sparse for meaningful distance comparisons, and the model exploits spurious patterns — a manifestation of the curse of dimensionality
D500 records is a large enough dataset for any number of features; the problem is overfitting due to wrong hyperparameters
With 500 examples and 305 features, the data is extremely sparse in the feature space. Distances between points become nearly meaningless (all points appear equidistant), so k-NN's core assumption — that nearby points are similar — breaks down. The high training accuracy (99%) with poor test accuracy (68%) is the signature of exploiting spurious high-dimensional patterns. More features require exponentially more data to maintain the same density, and 500 examples is nowhere near enough for 305 dimensions.
Question 2 Multiple Choice
As the number of dimensions in a dataset increases, what happens to the ratio of nearest-neighbor distance to farthest-neighbor distance from a reference point?
AThe ratio increases — nearest neighbors get much closer relative to farthest neighbors
BThe ratio stays constant — distances scale uniformly in all directions
CThe ratio approaches 1 — all points become approximately equidistant from the reference
DThe ratio approaches 0 — the farthest neighbors become infinitely distant
This is perhaps the most counterintuitive consequence of the curse of dimensionality. As dimensions increase, the variance in distances between all pairs of points decreases relative to the mean distance, so all points cluster around the same expected distance from any reference. When nearest and farthest distances are nearly equal, 'nearest neighbor' is essentially random — there is no meaningful neighborhood structure. This breaks any algorithm that relies on locality or proximity, including k-NN, clustering, and kernel methods.
Question 3 True / False
Adding more features to a machine learning model with a fixed training set usually improves or maintains test accuracy, since additional features provide the model with more information.
TTrue
FFalse
Answer: False
False — this is the core misconception that the curse of dimensionality refutes. As features increase, the feature space volume grows exponentially, making the fixed training set increasingly sparse. In sparse high-dimensional spaces, models can fit spurious patterns that happen by chance (overfitting), and distance-based notions of similarity break down. More features require exponentially more training data to avoid these problems; with a fixed dataset, adding features beyond a certain point actively hurts generalization.
Question 4 True / False
For distance-based algorithms like k-nearest neighbors, high dimensionality can make the 'nearest neighbor' concept meaningless because all pairwise distances between points become approximately equal.
TTrue
FFalse
Answer: True
True. This is a precise mathematical consequence of the curse of dimensionality, not just an intuition. The expected distance between random points in a unit hypercube grows with dimension, while the variance in those distances grows more slowly, causing the nearest/farthest distance ratio to approach 1. When this happens, no point is meaningfully 'closer' than any other, and the fundamental assumption of distance-based methods — that similar inputs are nearby — is violated.
Question 5 Short Answer
Why does adding features to a fixed-size dataset make it harder for a model to generalize, even if those features carry real signal?
Think about your answer, then reveal below.
Model answer: Each new feature adds a dimension, and the volume of the feature space grows exponentially with dimensions. With a fixed number of training examples, the data becomes increasingly sparse — points are further apart on average, and the local neighborhoods that distance-based methods rely on become empty. In this sparse space, it is easy for a model to find coincidental patterns that perfectly fit training data by chance but don't generalize. Even if the new features contain real signal, the exponential increase in space volume means you need exponentially more data to sample it adequately and distinguish real patterns from noise.
The curse of dimensionality is ultimately about the relationship between data volume and feature space volume. Real signal in a feature doesn't help if the training set is too sparse in that dimension to reliably estimate the pattern. The model instead latches onto coincidental correlations (noise), which appear as overfitting — perfect training accuracy, poor test accuracy. This is why dimensionality reduction, feature selection, and regularization are essential: they reduce the effective dimension of the problem to match the available data.