Questions: Sample Size Determination in Research Planning
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher runs an underpowered study (N=25 per group) and finds a statistically significant result at p < .05. What is the most accurate interpretation?
AThe result is reliable — statistical significance is the same regardless of sample size
BThe result is likely a true positive and probably represents the true effect size accurately
CThe significant result is likely real, but the effect size estimate is probably inflated compared to the true population effect
DThe result is certainly a false positive because the study was underpowered
This is the 'winner's curse.' To reach significance in an underpowered study, a random effect estimate must be larger than average — only inflated estimates cross the significance threshold when N is too small. So significant results from underpowered studies systematically overestimate effect sizes. The result may or may not be a true positive, but if it is, the published estimate is likely exaggerated. Replication attempts with adequate power then fail to find effects of that magnitude, contributing to the replication crisis.
Question 2 Multiple Choice
A researcher expects a small effect (d = 0.2) and recruits 30 participants per group. Which outcome is most likely?
AAdequate power to detect the effect because d = 0.2 is a real effect that alpha = .05 should catch
BThe study is severely underpowered — detecting d = 0.2 at 80% power requires roughly 394 participants per group
CThe study is slightly underpowered but will probably reach significance if the true effect exists
DPower is adequate because the researcher can always increase N after seeing a trend in the data
A small effect by Cohen's conventions (d = 0.2) requires ~394 participants per group to achieve 80% power at α = .05. With only 30 per group, power is roughly 11% — the study will miss the effect nine times out of ten. Researchers dramatically underestimate how large samples need to be for small effects. Adding participants after seeing a trend (optional stopping) inflates the false positive rate and is not a valid remedy.
Question 3 True / False
An underpowered study that finds a statistically significant result is more likely to accurately estimate the true effect size than an adequately powered study.
TTrue
FFalse
Answer: False
The opposite is true. To reach statistical significance when sample size is small, a random effect estimate must be inflated above the population value. This is the winner's curse: the significant result 'won' the noise lottery, producing an estimate that exceeds the truth. Adequately powered studies find significance for typical estimates near the true effect size, not just for outlier estimates. This is why the literature systematically overestimates effects when built from underpowered studies.
Question 4 True / False
Conducting a power analysis requires you to specify the expected effect size before collecting data.
TTrue
FFalse
Answer: True
True. Power is a function of four quantities: N, effect size, alpha, and power (1-β). A power analysis solves for N given the other three, so you must commit to an expected effect size in advance. This forces researchers to engage with prior literature and meta-analyses, and creates accountability through preregistration. Researchers who skip this step typically collect whatever N is convenient, which is almost always too small for the effect sizes their designs can realistically detect.
Question 5 Short Answer
Why do statistically significant results from underpowered studies often fail to replicate in later, adequately powered studies?
Think about your answer, then reveal below.
Model answer: Underpowered studies produce significant results only when their random effect estimates happen to be inflated above the true population value — the winner's curse. The published significant finding therefore overstates the true effect. When a larger, adequately powered study looks for an effect of the published magnitude, it doesn't find one at that size, even if the underlying effect is real and smaller. This systematic inflation of published effect sizes is a major cause of replication failures.
This connects underpowering to the replication crisis. It's not just that underpowered studies miss effects (Type II error) — it's that the ones they do catch are biased upward. The entire published literature on a topic can become an overestimate of reality when it is built from underpowered studies, making it look like science is inconsistent when the real problem is systematic bias at the study design stage.