Asymptotic Normality of the MLE

Research Depth 92 in the knowledge graph I know this Set as goal
Unlocks 1 downstream topic
mle asymptotics normal-approximation

Core Idea

Under regularity conditions, √n(θ̂_n - θ) converges in distribution to N(0, I(θ)^{-1}), so θ̂_n ≈ N(θ, I(θ)^{-1}/n) for large n. The convergence rate is √n and the asymptotic variance achieves the Cramér-Rao lower bound (asymptotic efficiency). This enables construction of confidence intervals and hypothesis tests.

Explainer

From the Central Limit Theorem (rigorous version), you know that the sample mean of i.i.d. random variables, properly scaled, converges to a normal distribution. The asymptotic normality of the MLE is the same phenomenon applied not to a simple average but to the maximizer of the log-likelihood. The result says: as the sample size n grows, the MLE θ̂_n behaves approximately like a normal random variable centered at the true parameter θ, with variance shrinking at rate 1/n. More precisely, the scaled deviation √n(θ̂_n − θ) converges in distribution to N(0, I(θ)⁻¹), where I(θ) is the Fisher information at the true parameter.

The proof sketch connects your prerequisites. The score function ∂log L/∂θ equals zero at the MLE (it is the first-order condition). Taylor-expanding the score around the true θ and rearranging gives: √n(θ̂_n − θ) ≈ [−(1/n)∂²log L/∂θ²]⁻¹ · [(1/√n)∂log L/∂θ]. The numerator — the scaled score — is a sum of i.i.d. terms with mean zero and variance I(θ), so by the CLT it converges to N(0, I(θ)). The denominator — the scaled observed Fisher information — converges to I(θ) by the law of large numbers. The ratio converges to N(0, I(θ)⁻¹). This argument is heuristic; the rigorous version requires regularity conditions (twice-differentiable log-likelihood, compact parameter space, identifiability) to justify the interchange of limit and differentiation.

The Fisher information I(θ) = E[(∂log f(X;θ)/∂θ)²] is the variance of the score — it measures how sensitively the log-likelihood changes as θ moves. High Fisher information means the data is very informative about θ, so the MLE can pin down θ precisely: the asymptotic variance I(θ)⁻¹ is small. Low Fisher information means the data carries little signal about θ, and the MLE's variance is large. This is not accidental — the Cramér-Rao lower bound says no unbiased estimator can have variance below I(θ)⁻¹/n. The MLE achieves this lower bound asymptotically, making it asymptotically efficient: in the large-sample limit, no competing estimator can have smaller variance.

This result is the workhorse of frequentist inference. Because θ̂_n ≈ N(θ, I(θ̂_n)⁻¹/n) for large n, you can construct approximate confidence intervals: θ̂_n ± z_{α/2} / √(n · I(θ̂_n)). You can test hypotheses using Wald statistics: n(θ̂_n − θ₀)² · I(θ̂_n) ≈ χ²(1) under H₀: θ = θ₀. The entire architecture of large-sample likelihood inference rests on this one asymptotic distribution result — it converts the MLE from a point estimate into a gateway to intervals and tests.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionFundamental Theorem of Calculus Part 1Fundamental Theorem of Calculus Part 2U-SubstitutionPartial Fraction Decomposition for IntegrationImproper Integrals - ConvergenceIntegral TestP-SeriesComparison TestLimit Comparison TestAbsolute vs. Conditional ConvergencePower SeriesTaylor PolynomialsTaylor SeriesMoment Generating FunctionsCharacteristic FunctionsConvergence in DistributionStationary DistributionsConvergence of Markov ChainsConvergence in ProbabilityAlmost Sure ConvergenceStrong Law of Large NumbersCentral Limit Theorem (Rigorous via Characteristic Functions)Asymptotic Normality of the MLE

Longest path: 93 steps · 519 total prerequisite topics

Prerequisites (3)

Leads To (1)