← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Cross-Validation Techniques

Graduate Depth 93 in the knowledge graph ☐ I know this ☆ Set as goal

3topics build on this

581prerequisites beneath it

See this on the map →

Bias-Variance Tradeoff Supervised Learning Fundamentals +4 more→→Hyperparameter Optimization

Core Idea

Cross-validation partitions data into train/test folds to estimate generalization error and tune hyperparameters without wasting data on a separate validation set. Stratified k-fold preserves class distribution; time-series splits respect temporal order; cross-validation reduces variance in error estimates compared to a single train/test split.

How It's Best Learned

Implement k-fold cross-validation and observe how error estimates vary with fold size and how folds affect hyperparameter selection.

Explainer

From your study of the bias-variance tradeoff, you know that a model's performance on training data is an optimistic estimate of how it will perform on unseen data. The naive solution is to hold out a separate test set, but this wastes precious data — in a dataset of 500 examples, reserving 100 for testing means training on only 400, which may yield a worse model. Cross-validation addresses this by systematically rotating which data serves as the test set, so every example is used for both training and evaluation.

In k-fold cross-validation, you partition the data into k equally sized subsets (folds). You train the model k times, each time holding out one fold as the test set and training on the remaining k−1 folds. The k test-set error estimates are then averaged to produce a single performance metric. With k = 5, for example, each model trains on 80% of the data and tests on 20%, and every data point appears in exactly one test fold. This gives you a much more reliable error estimate than a single random split, because the variance of the estimate decreases — you are averaging over k independent evaluations rather than depending on the luck of one particular partition.

The choice of k involves its own bias-variance tradeoff. Large k (approaching leave-one-out, where k = n) uses nearly all data for training, reducing bias in the error estimate, but the k training sets overlap heavily, making the individual estimates highly correlated and increasing variance. Small k (like k = 2) produces more independent estimates but trains on less data, introducing bias. k = 5 or k = 10 has emerged as a practical default because it balances these concerns well. Stratified k-fold ensures each fold preserves the class distribution of the full dataset, which is important when classes are imbalanced — without stratification, a fold might accidentally contain no examples of a rare class. For time-series data, standard k-fold violates temporal ordering (training on future data to predict the past), so time-series splits use expanding or sliding windows that always train on past data and test on future data.

Cross-validation's most important application is model selection and hyperparameter tuning. When choosing between, say, a decision tree with max depth 5 versus depth 10, you cannot compare their training errors (the deeper tree will always win on training data). Instead, you compare their cross-validated errors, which estimate generalization performance. You select the hyperparameters that minimize cross-validated error, then retrain the final model on all available data using those hyperparameters. This workflow — cross-validate to select, then retrain on everything — extracts maximum value from limited data while providing honest performance estimates that guard against overfitting.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Variance and Standard Deviation of Random Variables → Bias-Variance Tradeoff → Overfitting, Underfitting, and Model Capacity → Cross-Validation Techniques

Longest path: 94 steps · 581 total prerequisite topics

Prerequisites (6)

Supervised Learning Fundamentalshard Bias-Variance Tradeoffhard Probability Axiomssoft Sampling Methodssoft Descriptive Statistics Synthesissoft Overfitting, Underfitting, and Model Capacitysoft

Leads To (1)

Hyperparameter Optimizationhard