Questions — Cross-Validation Techniques

Question 1 Multiple Choice

You use 10-fold cross-validation to choose between model A (CV error: 5%) and model B (CV error: 4%). You select model B and report its 4% cross-validated error as your final model's performance. What is wrong with this workflow?

ANothing — 10-fold CV gives the best possible performance estimate

BYou should have used leave-one-out CV instead of 10-fold

CThe final model should be retrained on all data after hyperparameter selection, and reporting CV error as final performance conflates model selection with model evaluation

DCross-validation can only be used for binary classification, not regression

Question 2 Multiple Choice

For time-series data, why can't you use standard k-fold cross-validation where folds are created by random sampling?

ATime-series data always has too few observations for k-fold to work

BRandom folds may train on future data to predict past data, violating causal ordering and inflating performance estimates

CTime-series variables are too correlated across time for cross-validation to reduce variance

DStandard k-fold assumes independent observations, which is violated, but this only affects computational efficiency

Question 3 True / False

Increasing k in k-fold cross-validation generally produces better (lower-variance) performance estimates.

TTrue

FFalse

Question 4 True / False

Cross-validation can provide an unbiased estimate of model performance even when the same data is used for both hyperparameter tuning and error reporting.

TTrue

FFalse

Question 5 Short Answer

Why does k-fold cross-validation produce a more reliable generalization error estimate than a single random train/test split?

Think about your answer, then reveal below.

Questions: Cross-Validation Techniques