A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Maximum Likelihood Estimation

College Depth 92 in the knowledge graph ☐ I know this ☆ Set as goal

17topics build on this

436prerequisites beneath it

Probability Mass Functions and Discrete Distributions Least Squares Estimation +1 more→→Introduction to Bayesian Inference

mle estimation

Core Idea

MLE θ̂ maximizes likelihood L(θ)=∏p(x_i|θ) or L(θ)=∏f(x_i|θ). Under regularity, MLEs are consistent, asymptotically normal, and efficient. Often found via log-likelihood ℓ(θ)=Σlog p(x_i|θ) by solving dℓ/dθ=0.

Explainer

You already know that a probability mass function p(x|θ) gives the probability of observing outcome x when the true parameter is θ. Maximum likelihood estimation flips this question: given data that you have already observed, which value of θ makes that data most probable? The likelihood function L(θ) is exactly p(x|θ) re-read as a function of θ with the data held fixed. It is not a probability over θ — it is a measure of how "compatible" each candidate parameter value is with your observations.

For independent observations x₁, x₂, …, xₙ, the joint probability of the entire dataset is the product of individual probabilities: L(θ) = ∏ p(xᵢ|θ). The maximum likelihood estimate θ̂ is the value that makes this product as large as possible. Intuitively, you are asking: if I had to pick one θ and then "generate" the observed data from that distribution, which θ would make the data I actually saw the least surprising? The answer is θ̂.

In practice, products of many small numbers are numerically unstable and analytically awkward. Taking the logarithm converts the product into a sum: ℓ(θ) = Σ log p(xᵢ|θ). Because log is strictly increasing, maximizing ℓ(θ) gives the same θ̂ as maximizing L(θ). This log-likelihood is almost always what you differentiate in practice. Setting dℓ/dθ = 0 and solving yields the MLE, though for multiparameter models you set all partial derivatives to zero simultaneously.

A worked example cements the idea. Suppose you flip a coin n times and observe k heads. The PMF is p(k|θ) = C(n,k) θᵏ(1−θ)ⁿ⁻ᵏ. The log-likelihood is ℓ(θ) = k log θ + (n−k) log(1−θ) plus a constant. Differentiating and solving gives θ̂ = k/n — the sample proportion. This is unsurprising, but it is exactly what MLE says: the proportion you observed is the value of θ that would have made what you saw most probable.

Three asymptotic properties make MLE powerful beyond any single example. MLEs are consistent — as n → ∞, θ̂ converges to the true θ. They are asymptotically normal — the sampling distribution of θ̂ approaches a normal distribution, making inference tractable. And they are efficient — among all consistent estimators, MLEs achieve the smallest possible variance in the limit (the Cramér–Rao bound). These guarantees hold under "regularity conditions" — smoothness and identifiability constraints on the model — and they are the reason MLE is the workhorse of parametric estimation across statistics, machine learning, and econometrics.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Properties of Point Estimators → Unbiased and Consistent Estimators → Maximum Likelihood Estimation

Longest path: 93 steps · 436 total prerequisite topics

Prerequisites (3)

Probability Mass Functions and Discrete Distributionshard Least Squares Estimationsoft Unbiased and Consistent Estimatorssoft

Leads To (1)

Introduction to Bayesian Inferencesoft