A study with n = 10,000 finds p = .03 for a difference between two groups. What additional information is most important for evaluating this finding?
AThe exact alpha level used
BThe effect size (e.g., Cohen's d), because statistical significance does not indicate practical importance
CWhether the result was replicated, since large samples are unusual
DThe standard deviation of each group, since p-values don't account for spread
With very large samples, even trivially small differences become statistically significant. A p-value of .03 tells you the result is unlikely under the null, but says nothing about whether the difference is large enough to matter. Effect size (like Cohen's d) standardizes the difference by the spread of scores, making it interpretable regardless of sample size.
Question 2 True / False
A study fails to find a statistically significant result (p > .05). This means the researchers have demonstrated that no true effect exists.
TTrue
FFalse
Answer: False
A non-significant result from an underpowered study is not evidence of no effect — it may simply mean the study was too small to detect a real but modest effect. 'Absence of evidence is not evidence of absence.' To support a null finding, you need a well-powered study or a Bayesian analysis that quantifies evidence for the null hypothesis.
Question 3 Short Answer
Why do underpowered studies that happen to achieve statistical significance tend to overestimate the true effect size?
Think about your answer, then reveal below.
Model answer: In a low-powered study, only samples that produce an unusually large observed effect will clear the significance threshold. The estimates that get published are therefore a biased, lucky subset — those that happened to overestimate the true effect. This 'winner's curse' means effect sizes from underpowered studies shrink when better-powered replications are run.
This is a statistical consequence of sampling variability combined with a significance filter. The threshold for publication (p < .05) selects for samples at the tail of the sampling distribution, which are precisely the samples that overestimate the effect. Pre-registration and pre-specified power analyses help mitigate this bias.