Sample Complexity Bounds

Research Depth 68 in the knowledge graph I know this Set as goal
Unlocks 5 downstream topics
learning-theory sample-complexity upper-bounds lower-bounds

Core Idea

Sample complexity bounds answer the most practical question in learning theory: how many training examples are necessary and sufficient to learn a concept class to a desired accuracy? Upper bounds show that m(epsilon, delta) samples suffice for any algorithm (typically ERM); lower bounds show that no algorithm can learn with fewer. For realizable PAC learning with VC dimension d, the sample complexity is Theta((d/epsilon) * log(1/epsilon) + (1/epsilon) * log(1/delta)). For agnostic learning, it is Theta(d/epsilon^2 + log(1/delta)/epsilon^2). These bounds bridge theory and practice by converting abstract complexity measures (VC dimension) into concrete data requirements.

Explainer

Sample complexity bounds are where learning theory meets practice most directly. They answer the question every practitioner implicitly asks: "how much data do I need?" While the bounds are worst-case and often conservative, they provide the correct scaling relationships — how data requirements grow with model complexity, desired accuracy, and confidence level.

The realizable case (where the target function is in the hypothesis class) gives the cleanest bounds. The upper bound, proved via uniform convergence, states that m = O((d/epsilon) * log(1/epsilon) + (1/epsilon) * log(1/delta)) samples suffice, where d is the VC dimension. The matching lower bound, proved via information-theoretic arguments, shows that Omega(d/epsilon + log(1/delta)/epsilon) samples are necessary. These match up to the log(1/epsilon) factor, which is known to be tight. The key insight is the linear dependence on d — each additional unit of VC dimension adds a proportional amount to the data requirement.

The agnostic case (where the best hypothesis in the class may have nonzero error) has tighter matching bounds: Theta(d/epsilon^2 + log(1/delta)/epsilon^2). The extra factor of 1/epsilon compared to the realizable case reflects the statistical cost of estimating error rates rather than detecting the absence of errors. This quadratic dependence on 1/epsilon has significant practical implications: achieving 1% error requires 100 times more data than achieving 10% error, and achieving 0.1% error requires 10,000 times more than 10%. The implication is clear — pushing accuracy to very low levels requires enormous datasets or strong inductive biases.

Beyond the basic PAC bounds, sample complexity theory has been refined for specific hypothesis classes and learning settings. For linear classifiers in d dimensions (VC dimension d+1), the bound is Theta(d/epsilon^2). For kernel methods with margin gamma, the effective dimension is R^2/gamma^2 (where R is the data radius), which can be much smaller than the ambient dimension. For deep networks, the sample complexity is less well-characterized — it depends on the complexity measure used (spectral norms, PAC-Bayes, compression) and the resulting bounds are often loose. The general principle remains: sample complexity is proportional to the effective complexity of the hypothesis class (however measured) and inversely proportional to the square of the desired accuracy. This principle guides decisions about model selection, data collection, and the feasibility of learning tasks.

Practice Questions 4 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsGeometric Sequences and SeriesSigma NotationExpected ValueVariance and Standard Deviation of Random VariablesBias-Variance TradeoffPAC Learning FrameworkGrowth Function and ShatteringVC DimensionSample Complexity Bounds

Longest path: 69 steps · 358 total prerequisite topics

Prerequisites (3)

Leads To (4)