A researcher tests 20 dietary supplements for association with cancer risk, each at alpha = 0.05, and finds one significant result (p = 0.03). She reports this supplement as a confirmed risk factor. What is the fundamental problem?
AThe p-value of 0.03 is too close to 0.05 to be reliable
BWith 20 tests at alpha = 0.05, the probability of at least one false positive is about 64%, so the single significant result is likely a chance finding
CShe should have used alpha = 0.01 instead of 0.05 from the start
DThe problem is that she tested supplements rather than drugs, which have weaker effects
The probability of at least one false positive across k independent tests is 1 - (1 - alpha)^k. With k = 20 and alpha = 0.05, this is 1 - 0.95^20 ≈ 0.64. One significant result out of 20 is exactly what you would expect by chance alone. Without correction (e.g., Bonferroni threshold of 0.05/20 = 0.0025), the p = 0.03 result does not survive adjustment and should not be treated as confirmatory evidence.
Question 2 Multiple Choice
The Bonferroni correction divides the significance threshold by the number of tests (alpha/m). A study performs 1,000 genome-wide tests and applies Bonferroni. Why might this be problematic in practice?
ABonferroni only works for fewer than 100 tests
BThe adjusted threshold (0.05/1000 = 0.00005) is so stringent that the study has very low power to detect real but moderate effects
CBonferroni assumes tests are perfectly correlated, which is rarely true
DBonferroni increases the Type I error rate with more tests
Bonferroni controls the family-wise error rate — the probability of even one false positive — by dividing alpha by m. When m is large, the per-test threshold becomes extremely small, and the power to detect real effects plummets. This is why FDR-controlling procedures like Benjamini-Hochberg are preferred in high-dimensional settings: they accept some false positives in exchange for much better power to detect true effects. Bonferroni is also conservative (not anti-conservative) because it assumes worst-case independence; correlated tests make it even more conservative.
Question 3 True / False
The Benjamini-Hochberg procedure controls the false discovery rate rather than the family-wise error rate, making it identical to having no correction at all but with a different label.
TTrue
FFalse
Answer: False
The Benjamini-Hochberg procedure is a genuine correction — it ranks p-values and compares each to a threshold that depends on its rank and the total number of tests. It is less conservative than Bonferroni because it controls a different quantity: the expected proportion of false discoveries among rejected hypotheses, rather than the probability of any false discovery. With FDR at 5%, you expect that 5% of your significant results are false positives — this is much more permissive than Bonferroni but still provides meaningful error control, which is why it dominates in genomics and other high-dimensional fields.
Question 4 Short Answer
Explain the conceptual difference between controlling the family-wise error rate (FWER) and controlling the false discovery rate (FDR), and when each is appropriate.
Think about your answer, then reveal below.
Model answer: FWER controls the probability of making even one false positive across all tests — it answers 'what is the chance I report anything false?' FDR controls the expected proportion of false positives among the results declared significant — it answers 'among my discoveries, what fraction are likely wrong?' FWER is appropriate for confirmatory settings where any false positive has serious consequences (e.g., approving an ineffective drug). FDR is appropriate for exploratory settings where finding most true effects matters more than avoiding all false leads (e.g., identifying candidate genes for follow-up validation).
The distinction reflects different research goals. In a confirmatory clinical trial testing one primary endpoint, a single false positive could lead to approving a harmful or useless treatment — FWER control (even Bonferroni) is justified. In a microarray experiment testing 20,000 genes, Bonferroni would require p < 2.5 × 10⁻⁶, missing nearly all real signals. FDR at 5% accepts that 1 in 20 flagged genes may be a false lead, but recovers far more true associations for downstream validation.