A researcher rolls a fair die 120 times and gets counts {18, 22, 19, 21, 20, 20}. She computes χ² = 0.7 and p = 0.98. Her classmate says: 'A p-value that high means the test confirmed the die is fair — it proved H₀.' What is wrong with this interpretation?
ANothing — a p-value near 1.0 is the standard way to confirm a null hypothesis in chi-square testing
BA high p-value means the data are consistent with H₀, but this is not the same as proving H₀ is true — chi-square tests can only provide evidence against a null, not confirm it
CThe sample size of 120 is too large for chi-square — smaller samples are required
DChi-square cannot test whether a die is fair; it can only test independence
Hypothesis testing logic is asymmetric: a small p-value provides evidence against H₀, but a large p-value only means the data are not inconsistent with H₀ — it does not confirm H₀. A p = 0.98 means the observed counts are very close to what a fair die would produce; but many other dice (slightly biased ones) could produce the same data. Failing to reject H₀ is not the same as accepting it. This is one of the most important and persistent misconceptions in applied statistics.
Question 2 Multiple Choice
A chi-square test of independence returns p = 0.008 for the relationship between neighborhood income level and access to fresh produce (three categories: high/moderate/low). What can you correctly conclude from this result alone?
AHigher income causes better produce access — the causal direction is confirmed
BThere is a statistically significant association between income level and produce access, but the test does not identify which income categories drive the pattern or in what direction
C8% of survey respondents reported limited produce access regardless of income
DProduce access is at least 8% lower in low-income neighborhoods than high-income ones
Chi-square tests of independence detect whether an association exists — not its direction, magnitude, or cause. A p = 0.008 tells you the cell frequencies deviate significantly from what independent variables would produce, but not which cells are driving the deviation. To understand direction and pattern, you examine the individual (O−E)²/E contributions per cell after rejecting H₀. Causal inference requires study design beyond the test statistic.
Question 3 True / False
A chi-square test always rejects H₀ for very large values of the test statistic, and the test statistic is always non-negative.
TTrue
FFalse
Answer: True
Both parts are correct. The test statistic χ² = Σ(O−E)²/E is a sum of squared terms divided by positive denominators — it cannot be negative. It equals zero only when every observed count exactly matches the expected count, indicating perfect agreement with H₀. The test is one-tailed: we only reject H₀ when χ² is large (data far from H₀ predictions). There is no 'too low' rejection region, because a very small χ² just means the data fit H₀ well.
Question 4 True / False
When some expected cell counts in a chi-square test are below 5, the test statistic becomes larger, making the test more conservative and less likely to produce false positives.
TTrue
FFalse
Answer: False
The problem with small expected counts is not conservatism — it is that the chi-square distribution is no longer a reliable approximation to the true null distribution of the test statistic. With sparse cells, p-values become unreliable in unpredictable directions (they can be too small or too large). The conventional remedy is to use Fisher's exact test (for 2×2 tables) or to combine sparse categories. The minimum expected count of 5 is a rule of thumb for when the chi-square approximation holds.
Question 5 Short Answer
Explain why the chi-square test statistic χ² = Σ(O−E)²/E is always non-negative and what it means conceptually when χ² is close to zero.
Think about your answer, then reveal below.
Model answer: Each term (O−E)²/E squares the discrepancy between observed and expected counts, making every term non-negative, and sums them — so the total cannot be negative. When χ² is close to zero, every category's observed count is very close to its expected count under H₀, meaning the data look almost exactly like what H₀ predicts. This provides no evidence against H₀ — the data fit the null model well. χ² grows as any category's observed count deviates from expected, accumulating evidence across all categories that the data are inconsistent with H₀.
The squaring serves two purposes: it makes all terms non-negative (so deviations in opposite directions don't cancel out) and it penalizes large deviations more than small ones. Dividing by E normalizes each term — a deviation of 10 from an expected count of 20 is more striking than a deviation of 10 from an expected count of 1,000.