A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Asymptotic Normality of the MLE

Research Depth 110 in the knowledge graph ☐ I know this ☆ Set as goal

1topic build on this

724prerequisites beneath it

Central Limit Theorem (Rigorous via Characteristic Functions)Maximum Likelihood Estimation (Theory)+1 more→→Confidence Intervals (Rigorous Theory)

Core Idea

Under regularity conditions, √n(θ̂_n - θ) converges in distribution to N(0, I(θ)^-1), so θ̂_n ≈ N(θ, I(θ)^-1/n) for large n. The convergence rate is √n and the asymptotic variance achieves the Cramér-Rao lower bound (asymptotic efficiency). This enables construction of confidence intervals and hypothesis tests.

Explainer

From the Central Limit Theorem (rigorous version), you know that the sample mean of i.i.d. random variables, properly scaled, converges to a normal distribution. The asymptotic normality of the MLE is the same phenomenon applied not to a simple average but to the maximizer of the log-likelihood. The result says: as the sample size n grows, the MLE θ̂_n behaves approximately like a normal random variable centered at the true parameter θ, with variance shrinking at rate 1/n. More precisely, the scaled deviation √n(θ̂_n − θ) converges in distribution to N(0, I(θ)⁻¹), where I(θ) is the Fisher information at the true parameter.

The proof sketch connects your prerequisites. The score function ∂log L/∂θ equals zero at the MLE (it is the first-order condition). Taylor-expanding the score around the true θ and rearranging gives: √n(θ̂_n − θ) ≈ [−(1/n)∂²log L/∂θ²]⁻¹ · [(1/√n)∂log L/∂θ]. The numerator — the scaled score — is a sum of i.i.d. terms with mean zero and variance I(θ), so by the CLT it converges to N(0, I(θ)). The denominator — the scaled observed Fisher information — converges to I(θ) by the law of large numbers. The ratio converges to N(0, I(θ)⁻¹). This argument is heuristic; the rigorous version requires regularity conditions (twice-differentiable log-likelihood, compact parameter space, identifiability) to justify the interchange of limit and differentiation.

The Fisher information I(θ) = E[(∂log f(X;θ)/∂θ)²] is the variance of the score — it measures how sensitively the log-likelihood changes as θ moves. High Fisher information means the data is very informative about θ, so the MLE can pin down θ precisely: the asymptotic variance I(θ)⁻¹ is small. Low Fisher information means the data carries little signal about θ, and the MLE's variance is large. This is not accidental — the Cramér-Rao lower bound says no unbiased estimator can have variance below I(θ)⁻¹/n. The MLE achieves this lower bound asymptotically, making it asymptotically efficient: in the large-sample limit, no competing estimator can have smaller variance.

This result is the workhorse of frequentist inference. Because θ̂_n ≈ N(θ, I(θ̂_n)⁻¹/n) for large n, you can construct approximate confidence intervals: θ̂_n ± z_{α/2} / √(n · I(θ̂_n)). You can test hypotheses using Wald statistics: n(θ̂_n − θ₀)² · I(θ̂_n) ≈ χ²(1) under H₀: θ = θ₀. The entire architecture of large-sample likelihood inference rests on this one asymptotic distribution result — it converts the MLE from a point estimate into a gateway to intervals and tests.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Fundamental Theorem of Calculus Part 1 → Fundamental Theorem of Calculus Part 2 → U-Substitution → Partial Fraction Decomposition for Integration → Improper Integrals - Convergence → Integral Test → P-Series → Comparison Test → Limit Comparison Test → Series Convergence Test Strategy → Power Series → Radius and Interval of Convergence → Taylor Series → Moment Generating Functions → Characteristic Functions → Convergence in Distribution → Stationary Distributions → Convergence of Markov Chains → Convergence in Probability → Almost Sure Convergence → Relationships Between Modes of Convergence → Weak Law of Large Numbers → Strong Law of Large Numbers → Central Limit Theorem (Rigorous via Characteristic Functions) → Maximum Likelihood Estimation (Theory) → Asymptotic Normality of the MLE

Longest path: 111 steps · 724 total prerequisite topics

Prerequisites (3)

Maximum Likelihood Estimation (Theory)hard Central Limit Theorem (Rigorous via Characteristic Functions)hard Fisher Informationsoft

Leads To (1)

Confidence Intervals (Rigorous Theory)hard