Multiple Comparisons and Type I Error Rate Control

College Depth 82 in the knowledge graph I know this Set as goal
Unlocks 33 downstream topics
statistics type-i-error multiple-comparisons correction

Core Idea

Multiple comparisons problem occurs when researchers conduct numerous statistical tests within a single study, which inflates the family-wise Type I error rate (probability of at least one false positive) beyond the nominal alpha level. Each statistical test carries a probability of Type I error; conducting many tests mathematically increases the probability that at least one will be statistically significant by chance alone. Corrections including Bonferroni, Holm, false discovery rate (FDR), and permutation testing adjust p-values or alpha levels to maintain overall Type I error control. The appropriate severity of correction depends on whether tests are planned (confirmatory) versus exploratory.

How It's Best Learned

Simulate running multiple independent statistical tests where the null hypothesis is true and observe how often at least one reaches statistical significance.

Common Misconceptions

Bonferroni correction is always appropriate (actually, it can be overly conservative when tests are correlated). Multiple comparisons corrections only apply to many p-values from the same dataset (actually, any multiple tests of related hypotheses require correction).

Explainer

From inferential statistics, you know that a Type I error — rejecting a true null hypothesis — has probability α, conventionally set at .05. This means that if the null hypothesis is genuinely true, you'll obtain a "significant" result 5% of the time purely by chance. From your work on Type I and Type II error tradeoffs, you understand that setting α defines your tolerance for false positives in a single test. The multiple comparisons problem is what happens when you apply that single-test logic across an entire family of tests — and the conditional probability calculation that drives it follows directly from the probability foundations you already have.

Suppose you run 20 independent significance tests in a single study, each at α = .05, and all null hypotheses are actually true. What is the probability that at least one test reaches significance? Use the complement rule you know from conditional probability: 1 − (1 − .05)^20 ≈ 1 − .95^20 ≈ .64. With 20 independent tests of truly null effects, you'd observe at least one "significant" result about 64% of the time — in a universe of pure noise. This inflated rate is the family-wise error rate (FWER): the probability of at least one false positive across the family of tests. It grows rapidly: 10 tests yields roughly 40% FWER; 50 tests yields over 92%.

Bonferroni correction is the most conservative solution: divide the nominal α by the number of tests and require each individual test to reach that stricter threshold. For 20 tests, each test must clear p < .0025. This guarantees FWER ≤ .05 across the family, but at a cost: demanding much smaller p-values for each test increases the probability of Type II errors — real effects may be missed because they don't survive the heightened bar. Bonferroni assumes that all tests are independent; when tests are positively correlated (as they often are within a study, since they draw on the same participants), it becomes overly conservative — the actual FWER is already lower than .05 because the tests are not providing independent chances at a false positive.

The Holm procedure improves on Bonferroni by applying corrections sequentially. Rank your p-values from smallest to largest; compare the smallest to α/k, the second-smallest to α/(k−1), and so on, stopping when a test fails to reach its threshold. Every test that clears its step-down threshold is declared significant. Holm controls FWER as strictly as Bonferroni but is less conservative for the larger (less significant) p-values, so you recover some statistical power without sacrificing error control. For exploratory work where you are willing to tolerate a small proportion of false discoveries in exchange for more power to detect true ones, the false discovery rate (FDR) approach shifts the target: instead of controlling the probability of any false positive, it controls the expected proportion of significant findings that are false. The Benjamini-Hochberg procedure implements this and is standard in neuroimaging and genomics, where thousands of simultaneous tests make FWER control nearly impossible without destroying power entirely.

The underlying principle is that the right correction depends on your inferential goals and the structure of your tests. Pre-registered, theoretically motivated tests of specific hypotheses warrant less severe correction than post-hoc mining of a dataset for any significant association. When a researcher runs 50 correlations, finds 3 that survive α = .05, and reports only those 3, no correction applied to those 3 p-values can fix the problem — the issue is selective reporting, which makes the reported results uninterpretable regardless of what correction is applied. Multiple comparisons control is a statistical procedure that assumes honest reporting of the full family; it cannot substitute for transparency about how many tests were actually conducted.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionCentral Limit TheoremConfidence Intervals for MeansZ-Tests and T-Tests for MeansOne-Sample Z-Test for MeansOne-Sample and Two-Sample T-TestsInferential Statistics in PsychologyType I and Type II Error Trade-offs in Decision MakingMultiple Comparisons Problem and Correction MethodsMultiple Comparisons and Type I Error Rate Control

Longest path: 83 steps · 413 total prerequisite topics

Prerequisites (6)

Leads To (1)