A researcher flips a coin 10 times and observes 7 heads. Using a uniform prior on θ ∈ [0,1], what is the Bayesian posterior mean for θ?
A0.70 — the maximum likelihood estimate
B0.667 — the mean of the resulting Beta(8, 4) posterior
C0.50 — the prior mean
D0.75 — the upper bound of a 95% credible interval
With a uniform prior, the posterior is Beta(8, 4) (7 heads + 1, 3 tails + 1), whose mean is 8/12 ≈ 0.667. The MLE of 0.70 maximizes the likelihood but ignores the prior; the posterior mean incorporates it, pulling the estimate slightly toward 0.5. The prior always 'smooths' the estimate relative to the MLE.
Question 2 Multiple Choice
Which statement correctly describes a 95% Bayesian credible interval [a, b]?
AIf the experiment were repeated many times, 95% of such intervals would contain the true θ
BGiven the observed data, P(a ≤ θ ≤ b | X) = 0.95
CThe interval covers 95% of the prior distribution regardless of the data
DThe interval is centered on the maximum likelihood estimate
A Bayesian credible interval is a direct probability statement about θ given the observed data — computed by integrating the posterior. Option A describes the frequentist confidence interval, which makes no direct probability claim about θ; it characterizes the long-run behavior of the estimation procedure, not the probability that any particular interval contains θ.
Question 3 True / False
In Bayesian inference, the optimal point estimate under squared-error loss is the posterior mean E[θ|X].
TTrue
FFalse
Answer: True
Under squared-error loss, minimizing expected loss requires choosing the estimator equal to the conditional expectation of θ given the data — exactly the posterior mean. This is a direct application of the role of conditional expectation in Bayesian decision theory and is why the posterior mean is the canonical Bayesian point estimate.
Question 4 True / False
In Bayesian inference, the posterior distribution is computed before observing data; the likelihood then updates it afterward.
TTrue
FFalse
Answer: False
This reverses the logic. The prior π(θ) is specified before seeing any data — it encodes prior beliefs. The posterior π(θ|X) is computed AFTER observing X, by multiplying the prior by the likelihood L(θ|X) and normalizing. The likelihood is the bridge from data to posterior, not the other way around.
Question 5 Short Answer
Why does the Bayesian posterior mean typically differ from the frequentist maximum likelihood estimate, and what determines how large that difference is?
Think about your answer, then reveal below.
Model answer: The posterior mean incorporates the prior distribution, which pulls the estimate toward the prior's center of mass. The MLE maximizes the likelihood alone. The degree of difference depends on the relative informativeness of the prior versus the data: with few observations and an informative prior, the posterior mean is pulled substantially toward the prior; with abundant data, the likelihood dominates and the posterior mean converges toward the MLE.
The MLE treats θ as a fixed unknown and maximizes L(θ|X). The posterior mean treats θ as a random variable and computes E[θ|X] by integrating over the full posterior, which weights every value of θ by both how well it explains the data and how plausible it was a priori. Even a uniform prior shifts the posterior mean from the MLE in finite samples — for example, Beta(8,4) has mean 8/12 ≈ 0.667 while the MLE is 7/10 = 0.70.