Questions: Statistical Power and Sample Size Determination
4 questions to test your understanding
Score: 0 / 4
Question 1 Multiple Choice
A clinical trial is designed with 80% power to detect a 5-point difference in blood pressure between drug and placebo groups. The trial enrolls the planned sample but finds a non-significant result (p = 0.12). A colleague says: 'The study was adequately powered, so this proves the drug doesn't work.' What is wrong with this reasoning?
ANothing — 80% power guarantees detection of a 5-point difference if it exists
B80% power means there is still a 20% chance of failing to detect a true 5-point difference; absence of significance does not prove absence of effect
CThe power calculation is irrelevant because the p-value alone determines the conclusion
DThe study must have been underpowered because it failed to reach significance
Power of 80% means that if the true effect is exactly 5 points, the study has a 20% probability of failing to detect it (Type II error). A non-significant result is consistent with both 'no effect' and 'real effect that the study missed.' This is why non-significance is never proof of no effect — it is an absence of evidence, not evidence of absence. To quantify the range of plausible effects, examine the confidence interval rather than relying solely on the p-value.
Question 2 True / False
Holding alpha, effect size, and variability constant, doubling the sample size will double the statistical power of a study.
TTrue
FFalse
Answer: False
Power does not scale linearly with sample size. Power depends on the square root of n (because standard errors decrease proportionally to 1/sqrt(n)), so doubling n increases the test statistic by a factor of sqrt(2) ≈ 1.41, not 2. If a study at n = 50 has 50% power, doubling to n = 100 might raise power to roughly 70%, not 100%. The relationship is nonlinear, and the marginal gain from additional subjects diminishes as power approaches 100%.
Question 3 Multiple Choice
A researcher calculates that she needs 200 subjects per group to detect a 10-point difference with 80% power. She can only recruit 100 per group. Rather than reducing the study, she decides to increase alpha from 0.05 to 0.10 to compensate. Is this a valid strategy?
AYes — increasing alpha directly increases power, fully compensating for the smaller sample
BPartially valid — it increases power but at the cost of doubling the Type I error rate, which must be explicitly justified
CNo — alpha has no effect on power; only sample size matters
DNo — changing alpha after the sample size calculation invalidates the entire study
Increasing alpha does increase power (a less stringent threshold is easier to cross), but the cost is a higher probability of a false positive. Moving from alpha = 0.05 to 0.10 doubles the false-positive rate. This tradeoff may be acceptable in exploratory or screening contexts where missing a true effect is more costly than a false alarm, but in confirmatory trials it is generally not acceptable. The decision to adjust alpha must be pre-specified and scientifically justified — it is not a free lunch.
Question 4 Short Answer
Explain why effect size is the most important input to a sample size calculation and why researchers should base it on clinical significance rather than statistical convenience.
Think about your answer, then reveal below.
Model answer: Effect size determines the minimum difference the study is designed to detect. If chosen too large, the study will be small and cheap but will miss smaller real effects. If chosen too small, the study will require an enormous sample to detect trivially small differences that have no clinical importance. The effect size should reflect the smallest difference that would change clinical practice — a 1 mmHg blood pressure reduction might be statistically detectable with 50,000 subjects but is clinically meaningless. Basing effect size on clinical significance ensures the study answers a question worth asking.
Sample size is most sensitive to effect size because it enters the formula as a squared term (n is proportional to 1/d²). Halving the target effect size quadruples the required sample. This is why inflating the expected effect size to reduce enrollment is dangerous — if the true effect is smaller than assumed, the study will be underpowered. Conversely, choosing a clinically anchored effect size protects against both underpowering and overpowering.