Asymptotic Normality of MLEs

Research Depth 92 in the knowledge graph I know this Set as goal
Unlocks 1 downstream topic
asymptotic-normality mle asymptotics

Core Idea

Under regularity conditions, √n(θ̂ₙ - θ) converges in distribution to N(0, 1/I(θ)), where I(θ) is Fisher information. This shows MLEs are asymptotically normal and efficient (achieving the Cramer-Rao bound asymptotically). Asymptotic normality enables hypothesis tests and confidence intervals for MLEs.

Explainer

Your three prerequisites each contribute something essential here. From consistency of estimators, you know θ̂_n → θ in probability as n → ∞ — the MLE converges to the true parameter. From the central limit theorem (rigorous), you know that properly normalized sums of i.i.d. random variables converge in distribution to a normal. From Fisher information I(θ), you know it quantifies how much information each observation carries about θ, and that the Cramér-Rao bound says no unbiased estimator can have variance less than 1/(n I(θ)). Asymptotic normality of MLEs ties all three together: not only does the MLE converge, but the normalized deviation √n(θ̂_n − θ) has a specific, computable limiting distribution — N(0, 1/I(θ)).

The proof sketch is a Taylor expansion of the score function S(θ) = ∂/∂θ log L(θ; X₁,…,X_n) = Σᵢ ℓ'(θ; Xᵢ). At the MLE θ̂_n, the score is zero by definition. Taylor-expanding around the true θ: Σ ℓ'(θ; Xᵢ) + (θ̂_n − θ) Σ ℓ''(θ; Xᵢ) ≈ 0. Solving for (θ̂_n − θ): it equals −(Σ ℓ'(θ; Xᵢ)) / (Σ ℓ''(θ; Xᵢ)). The numerator, normalized by 1/√n, converges to N(0, I(θ)) by the CLT (since E[ℓ'(θ;X)] = 0 and Var[ℓ'(θ;X)] = I(θ)). The denominator divided by n converges to −I(θ) by the WLLN and the identity E[ℓ''(θ;X)] = −I(θ). After normalization, the ratio converges in distribution to N(0, 1/I(θ)).

The result says the MLE is asymptotically efficient: among all consistent, asymptotically normal estimators, it achieves the smallest possible asymptotic variance — exactly the Cramér-Rao bound. This is not a finite-sample claim; small samples can behave poorly. But for large n, no estimator can systematically beat the MLE in variance. The practical payoff is immediate: since √n(θ̂_n − θ) ≈ N(0, 1/I(θ)), an approximate 95% confidence interval for θ is θ̂_n ± 1.96/√(n·Î(θ)), where Î(θ) is Fisher information evaluated at the MLE.

Understanding the regularity conditions that support this result is as important as the result itself. The conditions — differentiability of the log-likelihood, identifiability of θ, finite Fisher information, interchange of differentiation and integration — can fail. When they do, for example with the Uniform(0, θ) model where the MLE is the maximum order statistic, the MLE may converge at a rate different from √n and to a non-normal limit distribution. Asymptotic normality is the generic case, but its exceptions teach you what makes estimation problems genuinely hard.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionFundamental Theorem of Calculus Part 1Fundamental Theorem of Calculus Part 2U-SubstitutionPartial Fraction Decomposition for IntegrationImproper Integrals - ConvergenceIntegral TestP-SeriesComparison TestLimit Comparison TestAbsolute vs. Conditional ConvergencePower SeriesTaylor PolynomialsTaylor SeriesMoment Generating FunctionsCharacteristic FunctionsConvergence in DistributionStationary DistributionsConvergence of Markov ChainsConvergence in ProbabilityAlmost Sure ConvergenceStrong Law of Large NumbersCentral Limit Theorem (Rigorous via Characteristic Functions)Asymptotic Normality of MLEs

Longest path: 93 steps · 520 total prerequisite topics

Prerequisites (3)

Leads To (1)