Chi-Square Test for Independence

College Depth 52 in the knowledge graph I know this Set as goal
chi-square independence

Core Idea

Tests independence of categorical variables. χ²=Σ(Observed−Expected)²/Expected with (rows−1)(cols−1) df. Expected counts computed under independence. Requires all expected counts≥5. Large χ² indicates association.

Explainer

The chi-square test for independence asks a specific question about a contingency table: are two categorical variables statistically independent, or does knowing one variable's category tell you something about the other? For example, does a person's smoking status (yes/no) relate to their disease outcome (sick/well)? Independence — your null hypothesis — has a precise probabilistic meaning from your hypothesis testing framework: P(A and B) = P(A) · P(B) for all categories A and B. The test constructs a statistic that measures how far the observed data deviates from what independence would predict.

The expected counts under independence are computed using a key formula: for a cell in row i and column j of an r × c table, the expected count is E_{ij} = (row i total) × (column j total) / (grand total). This formula follows directly from the independence definition. If smoking and disease are independent, the probability of being a smoking non-sick person should be P(smoking) × P(non-sick) — and multiplying by n gives the expected count. Compare this to the observed count O_{ij} (what you actually see) for every cell. If the two variables are truly independent, observed and expected counts should be close.

The test statistic aggregates these cell-by-cell discrepancies: χ² = Σ (O_{ij} − E_{ij})² / E_{ij}. The denominator E_{ij} standardizes the squared difference — a discrepancy of 5 in a cell with expected count 10 is very different from a discrepancy of 5 in a cell with expected count 1000. Large values of χ² signal systematic association between the variables. Under the null hypothesis of independence, this statistic follows approximately a chi-square distribution (your prerequisite) with (r − 1)(c − 1) degrees of freedom. The degrees of freedom count how many cells are free to vary: once the marginal totals are fixed, specifying (r−1)(c−1) cells determines the entire table.

Two practical requirements matter. First, all expected counts should be at least 5 — below this, the chi-square approximation deteriorates and Fisher's exact test is preferred. Second, the chi-square test detects association but says nothing about its direction or magnitude. A statistically significant result means the pattern of association is unlikely under independence; a large table can have significant chi-square with a very weak practical association. For effect size, pair the test with Cramér's V: V = √(χ² / (n · min(r−1, c−1))), which ranges from 0 (no association) to 1 (perfect association). The test gives the p-value; Cramér's V gives the strength.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsFunction Notation ReviewRandom Variables: Definition and ClassificationJoint and Marginal DistributionsConditional Distributions of Random VariablesRandom VariablesSampling DistributionsHypothesis Testing: Framework and LogicP-values and Statistical SignificanceEffect Size and Practical SignificanceHypothesis Testing: Framework and LogicChi-Square Test for Independence

Longest path: 53 steps · 210 total prerequisite topics

Prerequisites (3)

Leads To (0)

No topics depend on this one yet.