Cross-Validation and Out-of-Sample Model Evaluation

College Depth 82 in the knowledge graph I know this Set as goal
Unlocks 1 downstream topic
cross-validation out-of-sample model-evaluation

Core Idea

K-fold and leave-one-out cross-validation assess out-of-sample predictive performance by iteratively holding out data, fitting on the remainder, and testing on the holdout. This prevents overfitting and provides honest estimates of generalization error.

Explainer

You already know that R² measures how well a regression fits the data in your sample. The problem is that R² is measured on the same data used to estimate the model — and any model, no matter how badly specified, can be made to fit better simply by adding more variables. A model with as many parameters as observations will have R² = 1.0 while predicting new data no better than random noise. This pathology is called overfitting: the model has learned the quirks of your particular sample rather than the underlying pattern.

Cross-validation is the solution. The core idea is to simulate what happens when you use the model on new data — by actually withholding some data during estimation and testing the model on what it never saw. In k-fold cross-validation, you divide the dataset into k roughly equal groups (folds). You train the model on k−1 folds, then measure prediction error on the held-out fold. Repeat this k times, each time holding out a different fold. The average prediction error across all k held-out folds is your cross-validation error — an honest estimate of how well the model generalizes. With k = n (one observation held out each time), you get leave-one-out cross-validation (LOOCV), the most thorough variant but computationally expensive.

The key insight is what this procedure reveals that in-sample R² hides. Suppose you compare a simple 3-variable model to a complex 15-variable model. In-sample, the 15-variable model almost always wins on R². But out-of-sample, the simpler model often wins — because the extra variables in the complex model were fitting noise specific to your sample. Cross-validation penalizes this complexity automatically: models that overfit perform well on training folds but poorly on the held-out fold, which drags down the average CV error.

Cross-validation also provides a disciplined method for model selection. When choosing between competing specifications — different sets of regressors, different functional forms, or different regularization strengths — pick the model with the lowest cross-validation error rather than the highest in-sample R². This connects directly to your earlier understanding of R²: a high R² tells you how well the model describes the past; a low CV error tells you how well the model predicts the future. In causal economic research, you often care more about unbiased estimates than prediction, but in forecasting applications — GDP growth, asset prices, demand — cross-validated predictive performance is the primary scorecard.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionCentral Limit TheoremConfidence Intervals for MeansZ-Tests and T-Tests for MeansOne-Sample Z-Test for MeansOne-Sample and Two-Sample T-TestsOne-Way ANOVAF-Test and Joint SignificanceR-Squared and Model FitCross-Validation and Out-of-Sample Model Evaluation

Longest path: 83 steps · 421 total prerequisite topics

Prerequisites (2)

Leads To (1)