Questions: Cross-Validation Techniques

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

You use 10-fold cross-validation to choose between model A (CV error: 5%) and model B (CV error: 4%). You select model B and report its 4% cross-validated error as your final model's performance. What is wrong with this workflow?

ANothing — 10-fold CV gives the best possible performance estimate
BYou should have used leave-one-out CV instead of 10-fold
CThe final model should be retrained on all data after hyperparameter selection, and reporting CV error as final performance conflates model selection with model evaluation
DCross-validation can only be used for binary classification, not regression
Question 2 Multiple Choice

For time-series data, why can't you use standard k-fold cross-validation where folds are created by random sampling?

ATime-series data always has too few observations for k-fold to work
BRandom folds may train on future data to predict past data, violating causal ordering and inflating performance estimates
CTime-series variables are too correlated across time for cross-validation to reduce variance
DStandard k-fold assumes independent observations, which is violated, but this only affects computational efficiency
Question 3 True / False

Increasing k in k-fold cross-validation generally produces better (lower-variance) performance estimates.

TTrue
FFalse
Question 4 True / False

Cross-validation can provide an unbiased estimate of model performance even when the same data is used for both hyperparameter tuning and error reporting.

TTrue
FFalse
Question 5 Short Answer

Why does k-fold cross-validation produce a more reliable generalization error estimate than a single random train/test split?

Think about your answer, then reveal below.