A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Bayes' Theorem

College Depth 94 in the knowledge graph ☐ I know this ☆ Set as goal

375topics build on this

436prerequisites beneath it

Conditional Probability Law of Total Probability +2 more→→Adverse Selection Bandit Problems (Multi-Armed Bandits)+17 more

Core Idea

Bayes' theorem gives the posterior probability P(B|A) = P(A|B) × P(B) / P(A), allowing us to reverse the direction of conditioning. It describes how to update prior beliefs P(B) when we observe evidence A, using the likelihood P(A|B). This is foundational for statistical inference and decision-making under uncertainty.

How It's Best Learned

Start with medical testing scenarios (positive test → disease probability). Work through multi-step examples with explicit calculation of the denominator using the law of total probability.

Common Misconceptions

Confusing P(A|B) with P(B|A) (base rate fallacy). Forgetting to normalize by P(A) in the denominator.

Explainer

You have already learned conditional probability: P(A|B) is the probability of A *given* that B has occurred. Bayes' theorem answers a subtly different and enormously useful question: if I observe A, how should I update my belief about B? It lets you reverse the direction of conditioning — turning P(A|B) into P(B|A).

The formula is P(B|A) = P(A|B) · P(B) / P(A). Each piece has an intuitive name in statistical reasoning. P(B) is the prior — your belief about B before you see any evidence. P(A|B) is the likelihood — how probable is the evidence A if B were true? P(A) is the marginal likelihood — the overall probability of seeing the evidence A regardless of whether B is true. And P(B|A) is the posterior — your updated belief about B after observing A. The formula says: take what you thought before, weight it by how well B explains the evidence, and normalize.

The classic application is medical testing. Suppose a disease affects 1% of the population — P(disease) = 0.01. A test correctly identifies 90% of sick people: P(positive | disease) = 0.90. It also correctly identifies 91% of healthy people: P(negative | no disease) = 0.91, so P(positive | no disease) = 0.09. A person tests positive. What is P(disease | positive)? Using the law of total probability: P(positive) = (0.90)(0.01) + (0.09)(0.99) = 0.009 + 0.0891 = 0.0981. Then P(disease | positive) = (0.90 × 0.01) / 0.0981 ≈ 0.092, or about 9%. Despite the test being fairly accurate, the prior is so low that most positive results are still false positives.

This result shocks most people — and that shock is the whole point. The base rate fallacy is the systematic error of ignoring P(B) and treating P(A|B) as if it were P(B|A). A doctor who says "the test is 90% accurate and you tested positive, so you probably have the disease" has committed this fallacy. The denominator P(A) is the correction term: it forces you to account for how common the evidence is *in general*, not just among cases where B is true.

Bayes' theorem extends far beyond medical diagnosis. It is the foundation of Bayesian statistical inference, spam filters, machine learning classifiers, and scientific hypothesis updating. The key habit it instills is explicit reasoning about priors: every probability estimate you make implicitly contains assumptions about base rates. Making those priors explicit — and updating them correctly when evidence arrives — is what distinguishes probabilistic thinking from intuition.

Practice Questions 3 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Law of Total Probability → Bayes' Theorem

Longest path: 95 steps · 436 total prerequisite topics

Prerequisites (4)

Conditional Probabilityhard Law of Total Probabilityhard Complement Rule and Addition Rulesoft Probability Axioms and Rulessoft

Leads To (19)

Adverse Selectionsoft Bandit Problems (Multi-Armed Bandits)soft Base-Rate Integration and Bayesian Reasoning in Probabilitysoft Bayesian Games (Games of Incomplete Information)soft Bayesian Inference Foundationshard Bayesian Methods in Psychometric Modelinghard Bayesian Methods in Social Sciencehard Bayesian Networks and Inferencesoft Bayesian Phylogeneticshard Bayesian Thinking in Practicehard Cognitive Biases and Judgment Under Uncertaintysoft Heuristics in Judgment and Decision Makingsoft Hypothesis Construction: Directional and Nondirectional Predictionssoft Information Theory in Musicsoft Introduction to Bayesian Inferencehard Joint and Conditional Entropysoft Likelihood Ratios and Belief Updateshard Naive Bayes Classifiersoft PAC Learning Frameworksoft