A researcher fits a regression model with n = 10,000 observations and reports a very tight 95% confidence interval for the mean response at x = 5. A colleague says this means they can predict any individual patient's outcome with high precision. What is wrong with this claim?
ANothing — a tight confidence interval implies a tight prediction interval for the same data
BThe confidence interval estimates the population mean response, not individual outcomes; a prediction interval would be much wider due to irreducible person-to-person variation
CThe colleague should use a 99% confidence level instead of 95% for medical applications
DThe model must be misspecified if individual predictions are not as precise as the confidence interval
This is the core confusion the topic addresses. A confidence interval for the mean response narrows with more data because estimation uncertainty shrinks. But a prediction interval also includes σ² — the irreducible scatter of individual observations around the population mean — which does not shrink with more data. Even with a perfect knowledge of the regression line, patients would still vary around it. Option A is exactly the misconception: the tight CI does not imply a tight PI.
Question 2 Multiple Choice
As sample size n approaches infinity, what happens to a 95% prediction interval for a new observation?
AIt collapses to zero width, as all intervals do with sufficient data
BIt narrows to zero only if the true error variance σ² equals zero
CIt approaches a fixed non-zero width determined by the irreducible observation variance σ²
DIt becomes equivalent to the confidence interval for the mean response
As n → ∞, the estimation uncertainty in the mean (the h term in SE_pred²) vanishes, but the '1' term — representing individual-to-individual variance σ² — remains. The prediction interval approaches ŷ ± z* · σ, a fixed width determined by the true noise in the data-generating process. Option A applies to confidence intervals, not prediction intervals. Option D is incorrect: they converge to different limits.
Question 3 True / False
A confidence interval for the mean response and a prediction interval for a new observation answer the same underlying statistical question.
TTrue
FFalse
Answer: False
They answer fundamentally different questions. A CI asks: 'Where does the population mean μ_Y|x lie?' — a question about a fixed but unknown parameter. A PI asks: 'Where will the next individual observation at x fall?' — a question about a random variable with inherent scatter. The CI width goes to zero as n → ∞ because the parameter can be pinned down; the PI width has a lower bound because individual observations always vary around the mean.
Question 4 True / False
Prediction intervals are always wider than confidence intervals at the same x value and confidence level.
TTrue
FFalse
Answer: True
This follows directly from the formulas: SE_pred² = s²(1 + h) while SE_mean² = s² · h. The '1 +' in the prediction interval formula adds the irreducible variance component that is always positive, so SE_pred > SE_mean always, and therefore the prediction interval is always wider. The gap is largest near the center of the data (where h is small and the '1' dominates) and smallest far into extrapolation (where h is large for both).
Question 5 Short Answer
Explain why a prediction interval cannot shrink to zero width even with an arbitrarily large sample, while a confidence interval for the mean response can.
Think about your answer, then reveal below.
Model answer: A confidence interval captures estimation uncertainty — the wobble in the fitted line due to working from a finite sample. With more data, the estimated line converges to the true population line, and this uncertainty vanishes. A prediction interval also includes σ², the irreducible scatter of individual observations around any regression line, even a perfectly known one. That scatter reflects genuine person-to-person (or observation-to-observation) variation in the outcome, which is a property of the data-generating process, not of estimation. It cannot be reduced by collecting more data.
The mathematical marker of this distinction is the '1' in SE_pred² = s²(1 + h). The 'h' term captures estimation uncertainty (shrinks with n); the '1' captures irreducible observation variance (does not shrink). A PI is a statement about a random variable; a CI is a statement about a fixed parameter. Confusing them leads to false precision — acting as if the tight CI tells you where individual outcomes will fall when it only tells you where their mean is.