Questions: Effect Size Reporting and Practical Interpretation
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A study with n = 50,000 participants finds a statistically significant result (p < 0.001) for a new educational intervention, with Cohen's d = 0.04. What is the most appropriate interpretation?
AThe intervention has a practically meaningful benefit and should be widely adopted
BThe p-value is impressive, so the effect size doesn't matter for policy decisions
CWith a very large sample, even a trivially small effect can reach statistical significance — the practical impact appears negligible
DThe study is flawed because a significant result should have a larger effect size
Statistical significance and practical importance are separate questions. With 50,000 participants, enormous statistical power means even d = 0.04 — far below Cohen's 'small' threshold of 0.2 — will reliably reach significance. But d = 0.04 means the intervention moves the average person only 4% of a standard deviation. For most educational interventions, this is far too small to justify adoption costs. The p-value confirms the effect is real; the effect size tells you it barely matters. Both are needed for an honest interpretation.
Question 2 Multiple Choice
A new therapy shows Cohen's d = 0.3. Is this a clinically meaningful effect?
AYes — d = 0.3 exceeds Cohen's 'small' threshold of 0.2, so it is meaningful by definition
BNo — only d ≥ 0.5 ('medium') counts as a meaningful effect worth acting on
CIt depends on context: the cost, risk, available alternatives, and what the outcome means determine practical importance
DIt cannot be meaningful without also being statistically significant
Cohen's conventional thresholds are descriptive benchmarks from the social science literature, not universal standards of importance. A d of 0.3 for an inexpensive, low-risk public health intervention could be highly meaningful — a 30-cent screening that saves lives. The same d = 0.3 for an expensive, intensive clinical program might be disappointing. Context — cost, risk, alternatives, magnitude of the outcome — determines practical importance. The common misconception is treating Cohen's labels as verdicts rather than rough guides.
Question 3 True / False
A study with a very large sample can produce a statistically significant result even if the true effect size is too small to have any practical importance.
TTrue
FFalse
Answer: True
Statistical significance depends on both effect size and sample size. With N large enough, the standard error becomes tiny, and even a d of 0.01 will produce p < 0.05. This is why p-values alone cannot tell you whether an effect matters. Effect size is the sample-size-independent measure of magnitude — it tells you how big the difference actually is, regardless of how precisely it was measured.
Question 4 True / False
Cohen's conventional thresholds (small ≈ 0.2, medium ≈ 0.5, large ≈ 0.8) should serve as the primary standard for judging whether a finding has practical importance.
TTrue
FFalse
Answer: False
Cohen himself cautioned against mechanical application of his thresholds. They were derived empirically from average effect sizes across the social science literature — not from any analysis of what matters in practice. A d of 0.3 might be transformative in one context and trivial in another. Practical importance requires asking: given the cost, risk, and available alternatives, is this effect large enough to change decisions? The thresholds are a rough orientation, not a verdict.
Question 5 Short Answer
Why are effect sizes — rather than p-values alone — essential for conducting a meta-analysis that synthesizes results across multiple studies?
Think about your answer, then reveal below.
Model answer: Different studies use different sample sizes, measurement instruments, and raw score scales, making their p-values and raw means incomparable. You cannot meaningfully average p-values (they conflate effect size with sample size) or raw means (a '5-point improvement' on one scale means nothing relative to a '0.3-unit improvement' on another). Effect sizes like Cohen's d standardize across studies by expressing differences in standard deviation units, making results comparable regardless of the original scale. Meta-analysis averages these standardized estimates to estimate the true underlying effect across the literature.
Effect sizes are the currency of cumulative science. Each individual study is a noisy estimate of the true population effect. Meta-analysis reduces this noise by pooling estimates. But pooling only works if the estimates are on a common scale — which standardized effect sizes provide. Missing or misreported effect sizes in individual studies degrade every future meta-analysis that would have included that study, which is why complete effect size reporting is a form of scientific infrastructure.