Introduction to Bayesian Inference

College Depth 72 in the knowledge graph I know this Set as goal
Unlocks 13 downstream topics
bayesian inference probability

Core Idea

Bayesian inference uses Bayes' rule to update prior beliefs about parameters given data: P(θ|data) ∝ P(data|θ)P(θ). The posterior distribution combines information from the prior and likelihood. Bayesian methods naturally incorporate prior knowledge and quantify uncertainty.

How It's Best Learned

Apply Bayes' rule to simple problems with discrete parameters. Compare frequentist and Bayesian confidence/credible intervals. Choose sensible priors for familiar distributions. Recognize sensitivity of conclusions to prior specification.

Explainer

You already know Bayes' theorem: P(A|B) = P(B|A)P(A)/P(B). Bayesian inference is the application of this rule to statistical learning — using it to update beliefs about unknown parameters as data arrives. The key conceptual shift is that in the Bayesian framework, unknown parameters are treated as random variables with probability distributions, not as fixed but unknown constants. This makes it possible to make direct probability statements about parameters, which frequentist inference cannot do.

The structure of Bayesian inference has three components. The prior distribution P(θ) encodes your beliefs about the parameter θ before seeing any data. It might be broad and uninformative if you know little, or informative if domain knowledge constrains the plausible values. The likelihood P(data|θ) tells you how probable the observed data would be if the parameter were θ — this is the same likelihood function you encounter in maximum likelihood estimation. Multiplying them and normalizing gives the posterior distribution P(θ|data) ∝ P(data|θ)P(θ), which encodes updated beliefs about θ after observing the data. The posterior is the complete answer to a Bayesian inference problem.

A concrete example makes this tangible. Suppose you want to estimate a coin's probability of heads, θ. Your prior might be a Beta(2, 2) distribution — slightly favoring θ near 0.5 but not strongly. You flip the coin 10 times and see 7 heads. The likelihood is Binomial: P(7 heads | θ) ∝ θ⁷(1−θ)³. The posterior is Beta(2+7, 2+3) = Beta(9, 5) — a distribution centered near 9/14 ≈ 0.64, updated from 0.5 toward the observed proportion but not entirely swamped by the data. You can read off a credible interval: the central 95% of the Beta(9,5) distribution gives an interval within which θ falls with 95% probability, given the data and prior.

The contrast with frequentist inference is philosophically significant. A frequentist 95% confidence interval means: if you repeated this procedure many times, 95% of the resulting intervals would contain the true θ. It says nothing about the probability that *this* interval contains θ. A Bayesian 95% credible interval directly says: given this data and prior, P(θ ∈ interval | data) = 0.95. This is typically what practitioners intuitively want to say. The cost is that Bayesian inference depends on the prior, and different priors lead to different posteriors. When data is plentiful, the likelihood dominates and the prior matters little. When data is sparse, prior specification is critical — which is why sensitivity analysis (checking whether conclusions change under different reasonable priors) is a standard part of applied Bayesian work.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsBayesian Statistics: Prior, Posterior, Credible IntervalsIntroduction to Bayesian Inference

Longest path: 73 steps · 329 total prerequisite topics

Prerequisites (3)

Leads To (4)