Fisher Information

Research Depth 64 in the knowledge graph I know this Set as goal
Unlocks 10 downstream topics
fisher-information information-theory statistics

Core Idea

The Fisher information is I(θ) = E[(∂log f(X|θ)/∂θ)²] = -E[∂²log f(X|θ)/∂θ²]. It quantifies how much information the data carries about θ: larger I means θ is more precisely estimable. For n i.i.d. observations, Iₙ(θ) = nI(θ). Fisher information appears in the Cramer-Rao bound and characterizes the asymptotic variance of MLEs.

Explainer

From your work with expectations and densities, you know that the log-likelihood ℓ(θ; x) = log f(x|θ) measures how well the parameter value θ explains the observed data x. The derivative ∂ℓ/∂θ — called the score function — tells you which direction to move θ to increase the likelihood. At the true parameter value, the expected score is zero: E[∂log f(X|θ)/∂θ] = 0 (this is a regularity condition you can verify by differentiating under the integral sign). The Fisher information is the variance of the score: I(θ) = Var[∂log f(X|θ)/∂θ] = E[(∂log f(X|θ)/∂θ)²].

The intuition is curvature. Think of the log-likelihood as a landscape: the score is the slope, and the Fisher information measures how steeply the likelihood peaks around the true parameter. If I(θ) is large, the log-likelihood drops off sharply when you move θ away from the truth — different parameter values lead to noticeably different distributions, so the data "discriminates" well between them. If I(θ) is small, the log-likelihood is flat near the true value — many θ values produce similar distributions, so the data carries weak signal about which θ is correct. Large Fisher information means the parameter is easy to estimate precisely; small Fisher information means it is hard.

The equivalence I(θ) = −E[∂²log f(X|θ)/∂θ²] connects information to the curvature (second derivative) of the log-likelihood. The expected negative second derivative measures how sharply the log-likelihood bends downward at its peak — a high peak corresponds to high information. This form is often easier to compute in practice, because second derivatives of log-likelihoods are frequently simpler than squares of first derivatives. For exponential family distributions (Gaussian, Poisson, Bernoulli, etc.), there are clean closed-form expressions for I(θ).

The key property Iₙ(θ) = nI(θ) for i.i.d. observations reflects the additivity of information: each independent observation contributes the same amount of information I(θ) about θ, and independent contributions add. This linearity is what makes Fisher information so useful for sample size calculations — if you need information to scale by a factor of 4 (halving the standard error), you need 4 times as many observations. The payoff of all this machinery comes in the Cramér-Rao lower bound and in the asymptotic theory of the MLE: the maximum likelihood estimator achieves variance 1/I(θ) asymptotically, making it the most efficient estimator in the class of unbiased estimators, and Fisher information is the fundamental currency in which that efficiency is measured.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueIntegers and the Number LineOpposites and Additive InversesAbsolute ValueAdding IntegersSubtracting IntegersMultiplying IntegersDividing IntegersUnit RatesProportionsPercent ConceptConverting Between Fractions, Decimals, and PercentsOperations with Rational NumbersTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsGeometric Sequences and SeriesSigma NotationExpected ValueExpectation (Measure-Theoretic)Fisher Information

Longest path: 65 steps · 274 total prerequisite topics

Prerequisites (2)

Leads To (4)