A k-NN classifier is trained on a dataset with two features: age (range 0–80 years) and annual income (range $0–$500,000). No feature scaling is applied. What is the most likely consequence?
AThe model will fail to converge because k-NN requires normalized inputs to compute gradients
BIncome will dominate the distance calculation, effectively making age irrelevant to the predictions
CAge will dominate because biological age has more predictive power than income for most tasks
DBoth features contribute equally, because k-NN uses rank-order comparisons rather than raw distances
k-NN computes distances directly in feature space. Income spans 500,000 units while age spans only 80 — a difference of 1 in income is negligible, but it swamps a difference of 1 year in age. In Euclidean distance, the feature with the larger numeric range dominates the distance calculation, rendering smaller-scale features nearly invisible. This is why feature scaling (standardization or min-max normalization) is not optional for k-NN — it is a prerequisite for meaningful distance computation. k-NN has no training phase that could 'learn' to weight features correctly.
Question 2 Multiple Choice
A k-NN model with k=1 achieves 100% accuracy on training data but only 62% on held-out test data. Increasing k to 15 gives 88% training accuracy and 85% test accuracy. What best explains this pattern?
Ak=1 memorizes each training point perfectly — there is always a neighbor with distance zero — but overfits to noise; larger k smooths the decision boundary by averaging over more neighbors
Bk=1 is computationally faster, so it processes more training data before the time limit, learning more patterns
Ck=15 selects from a larger pool of training examples, effectively training on 15 times as much data
DIncreasing k introduces beneficial randomness that prevents the model from latching onto spurious correlations
With k=1, every training point is its own nearest neighbor, giving 100% training accuracy by definition — but this extreme overfitting means the decision boundary zigzags to accommodate every training example, including mislabeled or noisy ones. Increasing k requires a majority vote among k neighbors, smoothing out individual errors. This is the bias-variance tradeoff in k-NN: k=1 has high variance (sensitive to noise), large k has higher bias (misses local structure). k-NN has no explicit training step, so training speed is irrelevant.
Question 3 True / False
One advantage of k-NN over parametric models like logistic regression is that k-NN becomes faster to make predictions as the training set grows larger.
TTrue
FFalse
Answer: False
The opposite is true. Because k-NN stores all training examples and computes distances to every one at prediction time, query time scales as O(n) in the number of training examples. As the dataset grows, predictions get slower. By contrast, parametric models like logistic regression compress training data into a fixed set of parameters — prediction time stays constant regardless of training set size. This is a core practical limitation of lazy learning: the 'free' training phase is paid for at prediction time.
Question 4 True / False
Removing irrelevant features from a dataset can significantly improve k-NN accuracy, even if those same features would have negligible effect on a logistic regression model's performance.
TTrue
FFalse
Answer: True
Irrelevant features add noise to every distance calculation in k-NN, distorting the notion of 'nearest neighbor' — two examples that are genuinely similar may appear far apart because they differ on meaningless dimensions. As irrelevant features accumulate (related to the curse of dimensionality), distances become increasingly uniform and less informative. Logistic regression can learn near-zero weights for irrelevant features, effectively ignoring them. k-NN has no equivalent mechanism: all features contribute to distance unless explicitly removed or down-weighted.
Question 5 Short Answer
Explain what makes k-NN a 'lazy learner' and describe the key computational tradeoff this creates compared to 'eager' algorithms like logistic regression or decision trees.
Think about your answer, then reveal below.
Model answer: k-NN is lazy because it defers all computation to prediction time: it stores every training example without building any model, and only when queried does it compute distances to all training points, find the k nearest, and return a majority vote. Eager algorithms like logistic regression and decision trees do the opposite: they perform expensive computation during training to compress the data into a compact model (weights or a tree), then make predictions cheaply using that model. The tradeoff: k-NN has essentially zero training cost but linear prediction cost (O(n) per query); eager algorithms have significant training cost but constant-time prediction regardless of training set size.
This lazy/eager distinction also explains why k-NN can trivially incorporate new training data (just add it to storage) while eager algorithms must retrain. The practical consequence is that k-NN is well-suited for small, stable datasets with complex local structure, while eager algorithms are preferred for large datasets or applications requiring fast repeated prediction.