A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Fisher Information

Research Depth 91 in the knowledge graph ☐ I know this ☆ Set as goal

10topics build on this

508prerequisites beneath it

Distribution Functions and Densities (Rigorous)Expectation (Measure-Theoretic)→→Asymptotic Normality of MLEs Asymptotic Normality of the MLE +2 more

Core Idea

The Fisher information is I(θ) = E[(∂log f(X|θ)/∂θ)²] = -E[∂²log f(X|θ)/∂θ²]. It quantifies how much information the data carries about θ: larger I means θ is more precisely estimable. For n i.i.d. observations, Iₙ(θ) = nI(θ). Fisher information appears in the Cramer-Rao bound and characterizes the asymptotic variance of MLEs.

Explainer

From your work with expectations and densities, you know that the log-likelihood ℓ(θ; x) = log f(x|θ) measures how well the parameter value θ explains the observed data x. The derivative ∂ℓ/∂θ — called the score function — tells you which direction to move θ to increase the likelihood. At the true parameter value, the expected score is zero: E[∂log f(X|θ)/∂θ] = 0 (this is a regularity condition you can verify by differentiating under the integral sign). The Fisher information is the variance of the score: I(θ) = Var[∂log f(X|θ)/∂θ] = E[(∂log f(X|θ)/∂θ)²].

The intuition is curvature. Think of the log-likelihood as a landscape: the score is the slope, and the Fisher information measures how steeply the likelihood peaks around the true parameter. If I(θ) is large, the log-likelihood drops off sharply when you move θ away from the truth — different parameter values lead to noticeably different distributions, so the data "discriminates" well between them. If I(θ) is small, the log-likelihood is flat near the true value — many θ values produce similar distributions, so the data carries weak signal about which θ is correct. Large Fisher information means the parameter is easy to estimate precisely; small Fisher information means it is hard.

The equivalence I(θ) = −E[∂²log f(X|θ)/∂θ²] connects information to the curvature (second derivative) of the log-likelihood. The expected negative second derivative measures how sharply the log-likelihood bends downward at its peak — a high peak corresponds to high information. This form is often easier to compute in practice, because second derivatives of log-likelihoods are frequently simpler than squares of first derivatives. For exponential family distributions (Gaussian, Poisson, Bernoulli, etc.), there are clean closed-form expressions for I(θ).

The key property Iₙ(θ) = nI(θ) for i.i.d. observations reflects the additivity of information: each independent observation contributes the same amount of information I(θ) about θ, and independent contributions add. This linearity is what makes Fisher information so useful for sample size calculations — if you need information to scale by a factor of 4 (halving the standard error), you need 4 times as many observations. The payoff of all this machinery comes in the Cramér-Rao lower bound and in the asymptotic theory of the MLE: the maximum likelihood estimator achieves variance 1/I(θ) asymptotically, making it the most efficient estimator in the class of unbiased estimators, and Fisher information is the fundamental currency in which that efficiency is measured.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Expectation (Measure-Theoretic) → Fisher Information

Longest path: 92 steps · 508 total prerequisite topics

Prerequisites (2)

Expectation (Measure-Theoretic)hard Distribution Functions and Densities (Rigorous)hard

Leads To (4)

Asymptotic Normality of MLEshard Asymptotic Normality of the MLEsoft Cramer-Rao Lower Boundhard Information Geometry Basicshard