A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Variance and Higher Moments (Rigorous)

Research Depth 91 in the knowledge graph ☐ I know this ☆ Set as goal

115topics build on this

546prerequisites beneath it

Expectation (Measure-Theoretic)Variance and Standard Deviation of Random Variables→→Characteristic Functions Convergence in L^p +3 more

Core Idea

The k-th moment of X is μₖ = E[Xᵏ], which exists if E[|X|ᵏ] < ∞. Variance Var(X) = E[(X - E[X])²] measures spread; higher central moments μₖ = E[(X - E[X])ᵏ] capture skewness (k=3) and kurtosis (k=4). Hölder's inequality and Jensen's inequality are key tools relating moments.

Explainer

From measure-theoretic expectation, you know that E[X] = ∫ X dP is a Lebesgue integral with respect to the probability measure P. The k-th moment E[Xᵏ] is simply the integral of the function Xᵏ — that is, ∫ Xᵏ dP. The central question is always existence: when is this integral finite? The answer is the condition E[|X|ᵏ] < ∞, which is exactly the statement that Xᵏ is integrable, or equivalently that X ∈ Lᵏ(Ω, ℱ, P). The Lᵏ spaces you may know from functional analysis appear here as the natural home for random variables with finite k-th moments. Existence of higher moments is genuinely restrictive: X ~ Cauchy has no finite first moment; X ~ t_ν has finite moments only up to order ν − 1.

Variance Var(X) = E[(X − μ)²] = E[X²] − (E[X])² is the second central moment, measuring the average squared deviation from the mean. The measure-theoretic proof that E[X²] − (E[X])² ≥ 0 is a direct application of Jensen's inequality: for any convex function φ, φ(E[X]) ≤ E[φ(X)]. Taking φ(t) = t², Jensen gives (E[X])² ≤ E[X²], so Var(X) = E[X²] − (E[X])² ≥ 0, with equality iff X is almost surely constant. Jensen's inequality is pervasive: it gives the AM-GM inequality, concavity of entropy, and the fact that the geometric mean never exceeds the arithmetic mean, all from the same principle.

Hölder's inequality |E[XY]| ≤ E[|X|^p]^1/p · E[|Y|^q]^1/q (for conjugate exponents 1/p + 1/q = 1) is the other fundamental tool. The special case p = q = 2 is the Cauchy-Schwarz inequality: |E[XY]| ≤ √(E[X²]) √(E[Y²]), or equivalently |Cov(X,Y)| ≤ σ_X σ_Y. Hölder also establishes that existence of higher moments implies existence of lower ones: if E[|X|ᵏ] < ∞, then E[|X|ʲ] < ∞ for all j < k. This follows by applying Hölder with an indicator function. The Lᵏ spaces are nested: L² ⊆ L¹ for probability measures (a fact that is false for general σ-finite measures).

The third central moment μ₃ = E[(X − μ)³] measures skewness — asymmetry in the distribution. Positive skewness means the right tail is heavier (the distribution is pulled toward large positive deviations); negative skewness means the left tail. The standardized skewness γ₁ = μ₃/σ³ is the dimensionless version. The fourth central moment μ₄ = E[(X − μ)⁴] underlies kurtosis γ₂ = μ₄/σ⁴ − 3 (subtracting 3 so that the normal distribution has kurtosis 0). High kurtosis (leptokurtic) indicates heavy tails and a sharp peak; low kurtosis (platykurtic) indicates light tails. These higher moments appear throughout statistics: the moment conditions in the central limit theorem, the method of moments estimator, and the characterization of the normal distribution as the distribution determined by its first two cumulants all depend on this framework.

The rigorous treatment matters because moments can fail to characterize a distribution. There exist distinct distributions with identical moments of all orders — the log-normal and certain modifications have this property. The moment problem (when does a moment sequence uniquely determine a distribution?) is resolved by Carleman's condition: if ∑ₖ μ₂ₖ^(−1/2k) = ∞, the distribution is uniquely determined by its moments. This subtlety — invisible in informal treatments — is exactly the kind of issue that measure-theoretic probability is designed to surface and resolve.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Variance and Standard Deviation of Random Variables → Variance and Higher Moments (Rigorous)

Longest path: 92 steps · 546 total prerequisite topics

Prerequisites (2)

Expectation (Measure-Theoretic)hard Variance and Standard Deviation of Random Variablessoft

Leads To (5)

Characteristic Functionshard Convergence in L^phard Cramer-Rao Lower Boundhard Method of Momentshard Moment Generating Functionshard