A researcher finds 3 infections among 80 people surveyed (p̂ ≈ 0.0375). Should they use the standard Normal-based confidence interval formula?
AYes — the sample size of 80 is large enough for the Normal approximation
BNo — np = 80 × 0.0375 = 3, which is less than 10, so the Clopper-Pearson exact method is preferred
CYes — as long as n > 30, the CLT guarantees the Normal approximation is valid
DNo — you need at least n = 1000 before any confidence interval method is valid for proportions
The condition for using the Normal-based formula is np ≥ 10 AND n(1−p) ≥ 10. Here np = 3, which fails the condition. With so few expected successes, the Binomial distribution is heavily right-skewed — the Normal approximation is poor and the resulting interval may have much less than the nominal 95% coverage. The Clopper-Pearson interval uses the Binomial distribution directly and is appropriate when the Normal approximation conditions fail. The '30 observations' rule of thumb applies to means, not proportions.
Question 2 Multiple Choice
A 95% confidence interval for a proportion is computed as (0.42, 0.58). Which interpretation is correct?
AThere is a 95% probability that the true population proportion is between 0.42 and 0.58
B95% of the population falls between 0.42 and 0.58
CIf this sampling procedure were repeated many times, 95% of the resulting intervals would contain the true proportion
DThe sample proportion p̂ equals 0.50 with 95% certainty
The correct frequentist interpretation refers to the procedure, not this specific interval. The true proportion is fixed (not random), so it either is or isn't in (0.42, 0.58) — we just don't know which. The '95%' refers to the long-run performance of the method: 95% of intervals constructed this way will capture the true p. Option A is the most common misconception — treating a fixed parameter as if it has a probability distribution relative to a single computed interval.
Question 3 True / False
The margin of error for a 95% confidence interval for a proportion is maximized when p̂ = 0.5.
TTrue
FFalse
Answer: True
The margin of error is z_{α/2} √(p̂(1−p̂)/n). The term p̂(1−p̂) is maximized when p̂ = 0.5, giving 0.5 × 0.5 = 0.25. Any other value of p̂ gives a smaller product: e.g., 0.1 × 0.9 = 0.09, and 0.9 × 0.1 = 0.09. This is why a sample size calculated assuming p̂ = 0.5 is the conservative (largest) choice — it guarantees sufficient precision regardless of what the true proportion turns out to be.
Question 4 True / False
Doubling the sample size halves the margin of error in a confidence interval for a proportion.
TTrue
FFalse
Answer: False
The margin of error is proportional to 1/√n, not 1/n. Doubling n replaces √n with √(2n) = √2 · √n, reducing the margin by a factor of √2 ≈ 1.41 — a reduction of about 29%, not 50%. To halve the margin of error, you must quadruple the sample size. This square-root relationship means precision is expensive: each additional decimal place of accuracy requires a 100× increase in sample size.
Question 5 Short Answer
Why do we substitute p̂ for p in the standard error formula √(p(1−p)/n) when constructing a confidence interval, and what does this introduce?
Think about your answer, then reveal below.
Model answer: We substitute p̂ because p — the true population proportion — is unknown. That is precisely what we are trying to estimate. Using p̂ in its place produces an estimated standard error: SE = √(p̂(1−p̂)/n). This introduces additional uncertainty, since p̂ itself is a random variable that fluctuates across samples. In large samples, p̂ is close to p and this substitution works well. In small samples or when p is near 0 or 1, the approximation degrades, which is part of why the Normal-based interval requires the conditions np ≥ 10 and n(1−p) ≥ 10.
This substitution is sometimes called the 'plug-in principle' and is a common technique in statistics. The resulting interval is called the Wald interval. Its coverage can be poor for small n or extreme p precisely because the estimated SE is unreliable there. The Clopper-Pearson interval avoids this by inverting exact Binomial tail probabilities without needing to estimate the standard error from the data.