Questions: Type I and Type II Error Trade-offs in Decision Making
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Designers of a cancer screening test want to minimize the risk of telling a sick patient they are healthy. To achieve this, they lower the detection threshold — making it easier to flag a positive result. What is the trade-off?
AFewer false negatives, but no change in false positives since the threshold only affects one direction
BFewer false negatives (Type II errors), but more false positives (Type I errors) — more healthy people will be incorrectly flagged
CFewer false positives and fewer false negatives simultaneously — a lower threshold always improves both
DHigher statistical power with no increase in Type I error rate
Lowering the detection threshold (effectively raising α) reduces the chance of missing a real case (Type II error / false negative) but simultaneously increases the chance of flagging a healthy person as potentially sick (Type I error / false positive). The two error types are inversely related through the threshold: moving the threshold in either direction reduces one error while increasing the other. The only way to reduce both simultaneously is to increase sample size or test accuracy, not to adjust the threshold.
Question 2 Multiple Choice
A psychology study with 25 participants finds p = .11 and concludes 'no effect was found.' A replication with 250 participants on the same question finds p = .02. What is the most likely explanation?
AThe smaller study used a flawed measure that the larger study corrected
BThe larger study is probably a false positive — more participants increases the Type I error rate
CThe smaller study was underpowered — too few participants to reliably detect a real effect — making its null result likely a Type II error
DEffect sizes are always smaller in small samples and larger in large samples, making the comparison invalid
Underpowered studies miss real effects not because the effect isn't there, but because small samples produce high variance, making it hard to distinguish a real signal from noise. A null result from a study with 25 participants and a small-to-medium effect size tells you almost nothing — the probability of detecting the effect even if it existed (the power) may have been only 20-30%. The replication with 250 participants had enough power to detect the effect. This is why 'absence of evidence is not evidence of absence' in underpowered studies.
Question 3 True / False
A null result (p > .05) from an adequately powered study — one designed with enough participants to detect a plausible effect — provides meaningful evidence that the true effect is small or absent.
TTrue
FFalse
Answer: True
When a study is adequately powered, it could have detected a real effect if one existed. In that case, failing to find significance is genuinely informative: it suggests the effect is either absent or smaller than the minimum detectable size. This is 'evidence of absence' — a meaningful finding, not a non-result. The problem arises only with underpowered studies, where a null result is nearly uninterpretable because the study couldn't have detected the effect anyway.
Question 4 True / False
The conventional α = .05 threshold optimally balances Type I and Type II error risks for most research contexts.
TTrue
FFalse
Answer: False
α = .05 is a historical convention, not a principled optimum. The correct α depends on the relative costs of the two error types in the specific research context. A cancer screening test may warrant α = .10 or higher to minimize missed cases. A study justifying a costly new policy may warrant α = .01 to minimize false positives. A preliminary exploratory study may tolerate α = .10. The costs of false positives and false negatives vary enormously by domain, and α should reflect those costs — not default to a universal convention.
Question 5 Short Answer
Why is 'absence of evidence is not evidence of absence' particularly important when interpreting a null result from a small study?
Think about your answer, then reveal below.
Model answer: A small study typically has low statistical power — it has a low probability of detecting a real effect even if one exists. When such a study fails to find significance, the null result is ambiguous: it could mean the effect is absent, or it could mean the study simply couldn't see it. Because the false-negative rate (β) is high in underpowered studies, a null result carries little information about whether the effect is real. By contrast, a null result from a large, well-powered study is informative because the study had a high probability of detecting an effect if one existed.
The practical implication is that before interpreting any null result, you should ask: what was the power of this test? A p > .05 in a study with 80% power means something; a p > .05 in a study with 25% power tells you almost nothing. This skill — reading power alongside p-values — is one of the most important habits in evaluating psychological research.