When multiple statistical tests are performed simultaneously, the probability that at least one test produces a false positive increases rapidly — with 20 independent tests at alpha = 0.05, the probability of at least one false positive is approximately 64%. Multiple testing corrections adjust significance thresholds or p-values to control this inflated error rate. The two main frameworks are family-wise error rate (FWER) control, which limits the probability of any false positive (e.g., Bonferroni correction), and false discovery rate (FDR) control, which limits the expected proportion of false positives among rejected hypotheses (e.g., Benjamini-Hochberg procedure). FWER methods are conservative and appropriate for confirmatory studies; FDR methods are less conservative and better suited to exploratory, high-dimensional settings like genomics.
The multiple testing problem is one of the most important concepts in applied biostatistics because it arises in nearly every real study — any time you test more than one hypothesis, compare more than two groups, or examine more than one outcome. The underlying mathematics is straightforward: if each test has an independent 5% false-positive rate, the probability of at least one false positive across m tests is 1 - (1 - 0.05)^m. At m = 20, this reaches 64%. At m = 100, it exceeds 99%. Without correction, a study that tests many hypotheses is almost guaranteed to find something significant by chance.
The Bonferroni correction is the simplest and most conservative approach: divide the per-test significance level by the number of tests (alpha/m). If you perform 20 tests and want an overall alpha of 0.05, each individual test must reach p < 0.0025. This guarantees that the probability of any false positive across all tests remains at or below 5%. The cost is severe: each test now requires much stronger evidence, reducing the power to detect real effects. Bonferroni is appropriate when the number of tests is small and every false positive carries serious consequences — for example, testing a few pre-specified secondary endpoints in a clinical trial.
The false discovery rate framework, introduced by Benjamini and Hochberg in 1995, controls a fundamentally different quantity. Instead of asking "what is the probability of any false positive?", it asks "among the results I call significant, what proportion are false?" The Benjamini-Hochberg procedure ranks all p-values from smallest to largest and compares each to a threshold that increases with rank: the k-th smallest p-value is compared to (k/m) × q, where q is the desired FDR level. This allows more discoveries while maintaining a controlled rate of false findings among them. An FDR of 5% means that if you flag 100 genes as significant, you expect about 5 to be false discoveries — an acceptable tradeoff in exploratory research where flagged candidates will be validated independently.
Choosing between FWER and FDR depends on the research context. Confirmatory trials with regulatory consequences demand FWER control — approving an ineffective drug based on a false positive has real costs. Exploratory studies in genomics, proteomics, or epidemiological screening benefit from FDR control because the goal is to generate candidates for follow-up, and missing true signals is more costly than including a few false ones. Many studies use a hybrid approach: FDR for initial screening, followed by FWER-controlled confirmatory analysis on the reduced candidate set.