A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Introduction to Bayesian Inference

College Depth 100 in the knowledge graph ☐ I know this ☆ Set as goal

16topics build on this

598prerequisites beneath it

Bayes' Theorem Bayesian Statistics: Prior, Posterior, Credible Intervals +2 more→→Bayesian Point Estimation Causal Inference in Machine Learning +3 more

Core Idea

Bayesian inference uses Bayes' rule to update prior beliefs about parameters given data: P(θ|data) ∝ P(data|θ)P(θ). The posterior distribution combines information from the prior and likelihood. Bayesian methods naturally incorporate prior knowledge and quantify uncertainty.

How It's Best Learned

Apply Bayes' rule to simple problems with discrete parameters. Compare frequentist and Bayesian confidence/credible intervals. Choose sensible priors for familiar distributions. Recognize sensitivity of conclusions to prior specification.

Explainer

You already know Bayes' theorem: P(A|B) = P(B|A)P(A)/P(B). Bayesian inference is the application of this rule to statistical learning — using it to update beliefs about unknown parameters as data arrives. The key conceptual shift is that in the Bayesian framework, unknown parameters are treated as random variables with probability distributions, not as fixed but unknown constants. This makes it possible to make direct probability statements about parameters, which frequentist inference cannot do.

The structure of Bayesian inference has three components. The prior distribution P(θ) encodes your beliefs about the parameter θ before seeing any data. It might be broad and uninformative if you know little, or informative if domain knowledge constrains the plausible values. The likelihood P(data|θ) tells you how probable the observed data would be if the parameter were θ — this is the same likelihood function you encounter in maximum likelihood estimation. Multiplying them and normalizing gives the posterior distribution P(θ|data) ∝ P(data|θ)P(θ), which encodes updated beliefs about θ after observing the data. The posterior is the complete answer to a Bayesian inference problem.

A concrete example makes this tangible. Suppose you want to estimate a coin's probability of heads, θ. Your prior might be a Beta(2, 2) distribution — slightly favoring θ near 0.5 but not strongly. You flip the coin 10 times and see 7 heads. The likelihood is Binomial: P(7 heads | θ) ∝ θ⁷(1−θ)³. The posterior is Beta(2+7, 2+3) = Beta(9, 5) — a distribution centered near 9/14 ≈ 0.64, updated from 0.5 toward the observed proportion but not entirely swamped by the data. You can read off a credible interval: the central 95% of the Beta(9,5) distribution gives an interval within which θ falls with 95% probability, given the data and prior.

The contrast with frequentist inference is philosophically significant. A frequentist 95% confidence interval means: if you repeated this procedure many times, 95% of the resulting intervals would contain the true θ. It says nothing about the probability that *this* interval contains θ. A Bayesian 95% credible interval directly says: given this data and prior, P(θ ∈ interval | data) = 0.95. This is typically what practitioners intuitively want to say. The cost is that Bayesian inference depends on the prior, and different priors lead to different posteriors. When data is plentiful, the likelihood dominates and the prior matters little. When data is sparse, prior specification is critical — which is why sensitivity analysis (checking whether conclusions change under different reasonable priors) is a standard part of applied Bayesian work.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Conditional Distributions → Bivariate Normal Distribution → Normal Distribution: Properties and Fundamentals → Central Limit Theorem: Rigor and Applications → Confidence Intervals: General Framework → Margin of Error and Sample Size → Bayesian Statistics: Prior, Posterior, Credible Intervals → Introduction to Bayesian Inference

Longest path: 101 steps · 598 total prerequisite topics

Prerequisites (4)

Bayes' Theoremhard Probability Spaces (Measure-Theoretic Definition)soft Bayesian Statistics: Prior, Posterior, Credible Intervalssoft Maximum Likelihood Estimationsoft

Leads To (5)

Bayesian Point Estimationhard Causal Inference in Machine Learninghard Information-Theoretic Lower Boundssoft Parameter Estimation in Biological Modelssoft Phylogenetic Inference: Parsimony, Distance, and Maximum Likelihoodsoft