Bayes' theorem gives the posterior probability P(B|A) = P(A|B) × P(B) / P(A), allowing us to reverse the direction of conditioning. It describes how to update prior beliefs P(B) when we observe evidence A, using the likelihood P(A|B). This is foundational for statistical inference and decision-making under uncertainty.
Start with medical testing scenarios (positive test → disease probability). Work through multi-step examples with explicit calculation of the denominator using the law of total probability.
Confusing P(A|B) with P(B|A) (base rate fallacy). Forgetting to normalize by P(A) in the denominator.
You have already learned conditional probability: P(A|B) is the probability of A *given* that B has occurred. Bayes' theorem answers a subtly different and enormously useful question: if I observe A, how should I update my belief about B? It lets you reverse the direction of conditioning — turning P(A|B) into P(B|A).
The formula is P(B|A) = P(A|B) · P(B) / P(A). Each piece has an intuitive name in statistical reasoning. P(B) is the prior — your belief about B before you see any evidence. P(A|B) is the likelihood — how probable is the evidence A if B were true? P(A) is the marginal likelihood — the overall probability of seeing the evidence A regardless of whether B is true. And P(B|A) is the posterior — your updated belief about B after observing A. The formula says: take what you thought before, weight it by how well B explains the evidence, and normalize.
The classic application is medical testing. Suppose a disease affects 1% of the population — P(disease) = 0.01. A test correctly identifies 90% of sick people: P(positive | disease) = 0.90. It also correctly identifies 91% of healthy people: P(negative | no disease) = 0.91, so P(positive | no disease) = 0.09. A person tests positive. What is P(disease | positive)? Using the law of total probability: P(positive) = (0.90)(0.01) + (0.09)(0.99) = 0.009 + 0.0891 = 0.0981. Then P(disease | positive) = (0.90 × 0.01) / 0.0981 ≈ 0.092, or about 9%. Despite the test being fairly accurate, the prior is so low that most positive results are still false positives.
This result shocks most people — and that shock is the whole point. The base rate fallacy is the systematic error of ignoring P(B) and treating P(A|B) as if it were P(B|A). A doctor who says "the test is 90% accurate and you tested positive, so you probably have the disease" has committed this fallacy. The denominator P(A) is the correction term: it forces you to account for how common the evidence is *in general*, not just among cases where B is true.
Bayes' theorem extends far beyond medical diagnosis. It is the foundation of Bayesian statistical inference, spam filters, machine learning classifiers, and scientific hypothesis updating. The key habit it instills is explicit reasoning about priors: every probability estimate you make implicitly contains assumptions about base rates. Making those priors explicit — and updating them correctly when evidence arrives — is what distinguishes probabilistic thinking from intuition.