A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Conjugate Priors

Research Depth 111 in the knowledge graph ☐ I know this ☆ Set as goal

3topics build on this

744prerequisites beneath it

Bayesian Inference Foundations Bayesian Statistics: Prior, Posterior, Credible Intervals +1 more→→Bayesian Point Estimation

Core Idea

A prior π is conjugate for a likelihood if the posterior π(θ|X) is in the same family as the prior. For exponential family likelihoods, conjugate priors exist and have closed-form posteriors. Examples: Beta prior for Binomial likelihood, Normal prior for Normal likelihood with known variance. Conjugate priors simplify Bayesian computation.

Explainer

From your study of Bayesian inference, you know that updating beliefs works through Bayes' theorem: posterior ∝ likelihood × prior. In principle, this always works. In practice, the posterior is an integral that must be normalized — and for most likelihood-prior combinations, that integral has no closed form. Conjugate priors are the elegant exception: they are the priors for which the posterior lands in the same distributional family, making the update analytic.

The Beta-Binomial pairing is the canonical example. Suppose you're estimating the probability p of success in a coin flip, and you observe X successes in n trials. The Binomial likelihood is L(p | X) ∝ p^X (1−p)^n−X. Now choose a Beta(α, β) prior for p, which has density π(p) ∝ p^α−1(1−p)^β−1. Multiplying them together: posterior ∝ p^X+α−1(1−p)^n−X+β−1. This is exactly the kernel of a Beta(α + X, β + n − X) distribution. The posterior is Beta — the same family as the prior. The hyperparameters α and β act like "pseudo-counts": α is like having seen α−1 prior successes, β like α−1 prior failures. Observing X real successes simply adds X to α, and (n−X) to β.

This hyperparameter update rule is the payoff of conjugacy. You don't need integration; you just need addition. And the interpretation is intuitive: if you start with a flat (uninformative) Beta(1, 1) prior — which is just Uniform(0,1) — and observe 7 heads in 10 flips, your posterior is Beta(8, 4). The posterior mean is 8/(8+4) = 2/3, which sits between the prior mean of 1/2 and the maximum likelihood estimate of 7/10. The prior pulls the estimate toward the center; more data makes the data dominate.

A second important pair is Normal-Normal: a Normal prior on a Normal mean (with known variance) produces a Normal posterior. The update blends the prior mean and the sample mean in proportion to their respective precisions (inverse variances). A high-precision prior dominates; a high-precision likelihood (many observations or small sampling variance) dominates. This is the formal machinery behind the intuition that "more data should overwhelm prior beliefs."

Conjugate priors are not always the right choice — they may not accurately represent your actual prior beliefs, and modern Markov Chain Monte Carlo methods can handle arbitrary posteriors. But conjugate priors remain important for three reasons: they give closed-form posteriors for fast computation, their hyperparameters have natural interpretations as prior pseudo-data, and they build intuition for how Bayesian updating works before you encounter more complex models. When a conjugate prior exists and is reasonable, using it is both computationally efficient and conceptually transparent.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Fundamental Theorem of Calculus Part 1 → Fundamental Theorem of Calculus Part 2 → U-Substitution → Partial Fraction Decomposition for Integration → Improper Integrals - Convergence → Integral Test → P-Series → Comparison Test → Limit Comparison Test → Series Convergence Test Strategy → Power Series → Radius and Interval of Convergence → Taylor Series → Moment Generating Functions → Characteristic Functions → Convergence in Distribution → Stationary Distributions → Convergence of Markov Chains → Convergence in Probability → Almost Sure Convergence → Relationships Between Modes of Convergence → Weak Law of Large Numbers → Strong Law of Large Numbers → Central Limit Theorem (Rigorous via Characteristic Functions) → Maximum Likelihood Estimation (Theory) → Exponential Family of Distributions → Conjugate Priors

Longest path: 112 steps · 744 total prerequisite topics

Prerequisites (3)

Bayesian Inference Foundationshard Bayesian Statistics: Prior, Posterior, Credible Intervalshard Exponential Family of Distributionssoft

Leads To (1)

Bayesian Point Estimationsoft