Questions: Multiple Comparisons and Type I Error Rate Control
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher runs 20 independent hypothesis tests at α = .05 and finds 2 significant results. After applying Bonferroni correction, both remain significant. A reviewer still calls the original analysis problematic. What is the reviewer's most likely concern?
ABonferroni correction is never valid for more than 10 simultaneous tests
BWithout correction, the family-wise error rate for 20 tests was approximately 64% — meaning a very high chance of at least one false positive in a universe of pure noise, before any correction was applied
CTwo significant results from 20 tests is exactly the 10% rate expected by chance, so both must be false positives
DBonferroni correction increases Type I error, making the surviving results less trustworthy
The reviewer's concern is that the original uncorrected analysis had a family-wise error rate of 1 − (1 − .05)^20 ≈ .64 — a 64% chance of at least one false positive if all nulls are true. Bonferroni correction applied afterward does bring the surviving results to a defensible threshold, but the concern may be about analytical transparency (were the corrections pre-planned?) and whether the reported results are cherry-picked. The critique is about the design and reporting, not the mathematical validity of Bonferroni itself.
Question 2 Multiple Choice
A neuroimaging study tests 50,000 voxels simultaneously. The team uses Bonferroni correction to control family-wise error rate. A colleague recommends switching to FDR control. What is the main advantage of FDR in this high-dimensional setting?
AFDR control guarantees zero false positives, while Bonferroni allows up to 5%
BFDR control is less stringent — it tolerates a small proportion of false discoveries in exchange for substantially more statistical power to detect true effects across tens of thousands of tests
CFDR control is more conservative than Bonferroni, providing better error control with no power cost
DFDR adjusts each test's alpha upward when tests are correlated, making it more powerful than Bonferroni in all situations
Bonferroni at 50,000 tests requires each voxel to reach p < .05/50,000 = .000001 — an extraordinarily stringent threshold that will miss many real effects (high Type II error rate). FDR control (e.g., Benjamini-Hochberg) instead controls the expected proportion of significant results that are false positives. Accepting that perhaps 5% of reported significant voxels might be false positives dramatically lowers the required p-value threshold, recovering power to detect real signals. This tradeoff is appropriate in exploratory neuroimaging where some false positives are tolerable if many true signals are found.
Question 3 True / False
When 20 independent statistical tests are conducted at α = .05 and all null hypotheses are true, the probability that at least one test yields a significant result is approximately 64%.
TTrue
FFalse
Answer: True
Using the complement rule: P(at least one significant) = 1 − P(none significant) = 1 − (1 − .05)^20 = 1 − .95^20 ≈ 1 − .358 ≈ .642. This is the family-wise error rate (FWER) without any correction. It grows rapidly: 10 tests → ~40%, 30 tests → ~79%, 50 tests → ~92%. The intuition is powerful: each independent test is a separate lottery ticket with a 5% chance of a false 'win.' More tickets mean a near-certain false win eventually — even when nothing is real.
Question 4 True / False
Applying a multiple comparisons correction to a selected subset of statistically significant findings is sufficient to make those findings valid, even if the researcher ran many more tests and reported primarily the significant ones.
TTrue
FFalse
Answer: False
Multiple comparisons corrections are designed to be applied to the entire family of tests conducted. If a researcher runs 100 tests, finds 5 significant results, and then applies Bonferroni correction only to those 5, the correction is meaningless — it ignores the 95 tests that 'failed,' which were equally available to produce false positives. Selective reporting of only significant findings makes the reported p-values uninterpretable regardless of any post-hoc correction. No statistical procedure can compensate for the bias introduced by non-disclosure of the full family of tests.
Question 5 Short Answer
Explain why Bonferroni correction becomes overly conservative when the statistical tests within a study are positively correlated with each other.
Think about your answer, then reveal below.
Model answer: Bonferroni correction is derived by treating all k tests as statistically independent — each as a separate, unrelated chance of a false positive. The correction divides α by k assuming the worst case: k fully independent opportunities for error. When tests are positively correlated (e.g., testing related hypotheses with overlapping participant data), a false positive in one test makes false positives in correlated tests more likely — the tests are not providing k independent chances at a false positive. The actual FWER is therefore lower than the k-independent worst case, meaning Bonferroni over-corrects. Setting the threshold at α/k demands smaller p-values than the data structure warrants, increasing Type II error (missed real effects) without proportional gain in error control.
Holm's step-down procedure and permutation-based corrections are less conservative alternatives that can account for test correlation. The Benjamini-Hochberg FDR procedure is entirely agnostic about correlation in its guarantee (it controls expected FDR), making it robust in many high-correlation settings like genomics.