You use a Beta(2, 8) prior for a coin's bias p (reflecting a belief the coin is biased toward tails). You observe 6 heads in 10 flips. What is your posterior distribution?
ABeta(6, 4) — the likelihood replaces the prior
BBeta(8, 12) — prior and observed counts add together
CBeta(2, 8) — conjugacy means the prior is preserved unchanged
DBeta(4, 6) — the posterior equals the likelihood alone
With a Beta(α, β) prior and Binomial likelihood with X successes in n trials, the conjugate update is posterior = Beta(α + X, β + n − X). Here: α + X = 2 + 6 = 8, β + (n − X) = 8 + 4 = 12, giving Beta(8, 12). The prior hyperparameters act as pseudo-counts: the prior behaves like 1 prior success and 7 prior failures (α−1, β−1), and the 6 observed successes and 4 failures update these counts directly. The posterior mean is 8/20 = 0.4, which sits between the prior mean of 0.2 and the MLE of 0.6.
Question 2 Multiple Choice
What is the mathematical reason a Beta prior is conjugate to a Binomial likelihood?
ABoth distributions have support on [0, 1] and therefore multiply cleanly
BThe Beta distribution is the maximum entropy prior for a binary parameter
CMultiplying the Beta density by the Binomial likelihood produces a kernel that is recognizable as another Beta density
DThe Beta and Binomial are both members of the exponential family with the same natural parameter
Conjugacy is algebraic: it works because the functional forms match. Beta(α, β) ∝ p^{α−1}(1−p)^{β−1}. Binomial likelihood L(p|X) ∝ p^X(1−p)^{n−X}. Their product is ∝ p^{α+X−1}(1−p)^{β+n−X−1}, which is the kernel of Beta(α+X, β+n−X). The exponents just add. This is not about shared support (A) or entropy (B) — it is about the specific algebraic form of the densities. Options A and D are related facts but don't constitute the reason for conjugacy.
Question 3 True / False
Using a conjugate prior guarantees that the posterior accurately represents your true prior beliefs about the parameter.
TTrue
FFalse
Answer: False
Conjugate priors are chosen for computational convenience — they produce closed-form posteriors without numerical integration. They may not accurately reflect actual prior knowledge. If your true prior beliefs are, say, bimodal (you think the coin is either strongly biased heads or strongly biased tails), a unimodal Beta prior misrepresents them, conjugacy notwithstanding. The choice of prior should be driven by beliefs; conjugacy is a bonus when the conjugate prior happens to be a reasonable representation, but it is not a guarantee of accuracy.
Question 4 True / False
In the Beta-Binomial conjugate pair, the prior hyperparameters α and β can be interpreted as pseudo-counts of prior successes and failures, respectively.
TTrue
FFalse
Answer: True
This interpretation is exact: a Beta(α, β) prior behaves as if you had already observed α−1 successes and β−1 failures before collecting any data. Starting with Beta(1,1) — a flat uniform prior — is equivalent to zero pseudo-counts (no prior data). Observing X successes in n trials adds X to α and (n−X) to β, giving Beta(α+X, β+n−X). This pseudo-count interpretation makes the prior directly comparable to real data and shows how prior beliefs are 'outweighed' as n grows.
Question 5 Short Answer
What does it mean for a prior to be 'conjugate' to a likelihood, and why does this property matter computationally?
Think about your answer, then reveal below.
Model answer: A prior is conjugate to a likelihood when the posterior — computed via Bayes' theorem (posterior ∝ likelihood × prior) — falls in the same distributional family as the prior. Computationally, this matters because the Bayesian update reduces to simple arithmetic on the hyperparameters rather than numerically integrating to normalize the posterior. For Beta-Binomial, you just add counts. For Normal-Normal, you blend means weighted by precision. This gives closed-form posteriors, fast sequential updating, and interpretable hyperparameters as prior pseudo-data.
Conjugacy is not a universal feature of Bayesian inference — most likelihood-prior pairs do not have closed-form posteriors, which is why MCMC methods are needed in practice. Conjugate priors are a special algebraic coincidence where the prior's functional form absorbs the likelihood without changing shape. Their importance today is partly historical and pedagogical — they build intuition for Bayesian updating before introducing computational methods — and partly practical in simple models where fast computation matters.