A chi-square test of independence on a 2×2 table yields χ² = 12.4, p = .0004. A researcher concludes: 'Smoking strongly causes lung disease.' What is wrong with this conclusion?
ANothing — a significant chi-square result proves a strong causal relationship
BThe chi-square test only detects statistical association; it says nothing about direction, magnitude, or causation
CThe degrees of freedom for a 2×2 table should be 4, making the test invalid
DA p-value below .001 is too small for chi-square; the test requires p > .01 to be interpretable
The chi-square test answers one question: is the pattern of observed counts unlikely under independence? A significant result means association is present — it does not indicate how strong it is (use Cramér's V for effect size), which variable influences which (observational data cannot establish causation), or the practical significance of the relationship. All three claims in the researcher's conclusion — 'strongly,' 'causes,' 'lung disease specifically' — require evidence beyond the chi-square p-value alone.
Question 2 Multiple Choice
A 2×2 contingency table has row totals of 40 and 60, column totals of 50 and 50, and a grand total of 100. What is the expected count for the top-left cell (row 1, column 1) under the null hypothesis of independence?
A25 — dividing the grand total equally among all four cells
B20 — multiplying the two column totals and dividing by the grand total
C20 — computed as (row 1 total × column 1 total) / grand total = (40 × 50) / 100
D45 — summing the row 1 and column 1 totals and subtracting the grand total
Under independence, P(row 1 AND col 1) = P(row 1) × P(col 1) = (40/100) × (50/100). The expected count is this probability times the grand total: (40/100) × (50/100) × 100 = 40 × 50 / 100 = 20. Options A and B confuse the formula; option D is algebraically unmotivated. The expected count formula E = (row total × column total) / grand total follows directly from the definition of statistical independence.
Question 3 True / False
In a chi-square test for independence, expected cell counts are computed assuming the two variables are independent.
TTrue
FFalse
Answer: True
Expected counts represent what the data should look like if the null hypothesis (independence) is exactly true. The formula E = (row total × column total) / grand total is derived from the multiplication rule for independent events: P(A and B) = P(A) × P(B). The chi-square statistic then measures how far the observed counts deviate from these independence-predicted expected counts. Large deviations produce large χ² values, which are unlikely under the null hypothesis.
Question 4 True / False
A statistically significant chi-square test result demonstrates that the association between two categorical variables is practically important and large.
TTrue
FFalse
Answer: False
Statistical significance and practical importance are distinct. With a large enough sample, even a trivially small association will produce a significant chi-square result, because the denominator (expected count) grows with n while tiny systematic deviations accumulate. A highly significant p-value from a study of 10,000 participants might correspond to a Cramér's V of 0.04 — a negligible association by most standards. Always pair the chi-square p-value with an effect size measure like Cramér's V to assess practical magnitude.
Question 5 Short Answer
Why must all expected cell counts be at least 5 for the chi-square test to be valid, and what is the recommended alternative when this condition fails?
Think about your answer, then reveal below.
Model answer: The chi-square test statistic follows the chi-square distribution only approximately — the approximation is derived from the normal approximation to the binomial, which works well when counts are large but breaks down for small expected counts. When expected counts fall below 5, the discrete cell counts do not approximate the continuous chi-square distribution well, making the p-value unreliable (typically anti-conservative — the test appears more significant than it truly is). Fisher's exact test is the recommended alternative: it computes the exact probability of the observed or more extreme table using the hypergeometric distribution, without relying on any large-sample approximation.
The expected count condition is a sample-size adequacy criterion specifically for the chi-square approximation. It is most commonly violated in tables with many cells (large r × c tables) or with very unequal marginal distributions, where some cells receive very few observations under independence.