A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Asymptotic Normality of MLEs

Research Depth 111 in the knowledge graph ☐ I know this ☆ Set as goal

1topic build on this

726prerequisites beneath it

Central Limit Theorem (Rigorous via Characteristic Functions)Consistency of Estimators +2 more→→Confidence Intervals (Rigorous Theory)

Core Idea

Under regularity conditions, √n(θ̂ₙ - θ) converges in distribution to N(0, 1/I(θ)), where I(θ) is Fisher information. This shows MLEs are asymptotically normal and efficient (achieving the Cramer-Rao bound asymptotically). Asymptotic normality enables hypothesis tests and confidence intervals for MLEs.

Explainer

Your three prerequisites each contribute something essential here. From consistency of estimators, you know θ̂_n → θ in probability as n → ∞ — the MLE converges to the true parameter. From the central limit theorem (rigorous), you know that properly normalized sums of i.i.d. random variables converge in distribution to a normal. From Fisher information I(θ), you know it quantifies how much information each observation carries about θ, and that the Cramér-Rao bound says no unbiased estimator can have variance less than 1/(n I(θ)). Asymptotic normality of MLEs ties all three together: not only does the MLE converge, but the normalized deviation √n(θ̂_n − θ) has a specific, computable limiting distribution — N(0, 1/I(θ)).

The proof sketch is a Taylor expansion of the score function S(θ) = ∂/∂θ log L(θ; X₁,…,X_n) = Σᵢ ℓ'(θ; Xᵢ). At the MLE θ̂_n, the score is zero by definition. Taylor-expanding around the true θ: Σ ℓ'(θ; Xᵢ) + (θ̂_n − θ) Σ ℓ''(θ; Xᵢ) ≈ 0. Solving for (θ̂_n − θ): it equals −(Σ ℓ'(θ; Xᵢ)) / (Σ ℓ''(θ; Xᵢ)). The numerator, normalized by 1/√n, converges to N(0, I(θ)) by the CLT (since E[ℓ'(θ;X)] = 0 and Var[ℓ'(θ;X)] = I(θ)). The denominator divided by n converges to −I(θ) by the WLLN and the identity E[ℓ''(θ;X)] = −I(θ). After normalization, the ratio converges in distribution to N(0, 1/I(θ)).

The result says the MLE is asymptotically efficient: among all consistent, asymptotically normal estimators, it achieves the smallest possible asymptotic variance — exactly the Cramér-Rao bound. This is not a finite-sample claim; small samples can behave poorly. But for large n, no estimator can systematically beat the MLE in variance. The practical payoff is immediate: since √n(θ̂_n − θ) ≈ N(0, 1/I(θ)), an approximate 95% confidence interval for θ is θ̂_n ± 1.96/√(n·Î(θ)), where Î(θ) is Fisher information evaluated at the MLE.

Understanding the regularity conditions that support this result is as important as the result itself. The conditions — differentiability of the log-likelihood, identifiability of θ, finite Fisher information, interchange of differentiation and integration — can fail. When they do, for example with the Uniform(0, θ) model where the MLE is the maximum order statistic, the MLE may converge at a rate different from √n and to a non-normal limit distribution. Asymptotic normality is the generic case, but its exceptions teach you what makes estimation problems genuinely hard.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Fundamental Theorem of Calculus Part 1 → Fundamental Theorem of Calculus Part 2 → U-Substitution → Partial Fraction Decomposition for Integration → Improper Integrals - Convergence → Integral Test → P-Series → Comparison Test → Limit Comparison Test → Series Convergence Test Strategy → Power Series → Radius and Interval of Convergence → Taylor Series → Moment Generating Functions → Characteristic Functions → Convergence in Distribution → Stationary Distributions → Convergence of Markov Chains → Convergence in Probability → Almost Sure Convergence → Relationships Between Modes of Convergence → Weak Law of Large Numbers → Strong Law of Large Numbers → Central Limit Theorem (Rigorous via Characteristic Functions) → Maximum Likelihood Estimation (Theory) → Consistency of Estimators → Asymptotic Normality of MLEs

Longest path: 112 steps · 726 total prerequisite topics

Prerequisites (4)

Consistency of Estimatorshard Central Limit Theorem (Rigorous via Characteristic Functions)hard Fisher Informationhard Cramer-Rao Lower Boundsoft

Leads To (1)

Confidence Intervals (Rigorous Theory)hard