A researcher compares exam scores across three teaching methods and obtains F = 4.2, p = 0.02. She reports: 'Method B has the highest mean and is significantly better than Method A.' What is wrong with this conclusion?
ANothing — F > 1 with p < 0.05 confirms that Method B outperforms Method A
BThe F-test only tells you that at least one group mean differs; identifying which pairs differ requires post-hoc tests
CThe conclusion is valid only if sample sizes across groups are equal
Dp = 0.02 is not small enough to reject the null hypothesis at α = 0.05
A significant F-statistic rejects H₀: μ₁ = μ₂ = μ₃ — it tells you at least one group mean is different from at least one other. It does not tell you which specific pairs differ. To compare Method B vs. Method A directly, a post-hoc test (such as Tukey's HSD) is required, which adjusts for the multiple-comparison problem. Claiming Method B beats Method A based on the F-test alone is an overclaim.
Question 2 Multiple Choice
Why does one-way ANOVA use the ratio MS_between / MS_within rather than directly comparing group means to zero?
ATo adjust for unequal group sizes before computing the test statistic
BTo compare the variation explained by group membership against the baseline noise within groups
CTo convert the test statistic to a chi-square distribution for standard tables
DTo avoid the normality assumption that would be required for direct mean comparisons
MS_within measures within-group variability — the scatter that exists even if groups truly have identical means. Comparing the between-group signal to this noise baseline is what allows the test to distinguish real group differences from ordinary random variation. If MS_within is large (noisy data), only large between-group differences produce a notable F. Comparing group means to zero would ignore this baseline noise entirely.
Question 3 True / False
Running most pairwise t-tests instead of ANOVA controls the overall Type I error rate just as effectively.
TTrue
FFalse
Answer: False
Each individual t-test has a false-positive rate of α. With k groups, there are k(k−1)/2 pairwise tests. If tests are independent, the probability of at least one false positive is 1 − (1−α)^m, where m is the number of tests. With 5 groups (10 tests) at α = 0.05, this rises to about 40%. ANOVA provides a single omnibus test that keeps the overall error rate at α, which is why it was developed in the first place.
Question 4 True / False
Under the null hypothesis of one-way ANOVA (all group means equal), the F-statistic should be near 1.
TTrue
FFalse
Answer: True
When all group means are equal, both MS_between and MS_within are estimating the same underlying population variance σ². Their ratio F = MS_between / MS_within should therefore be near 1. Departures substantially above 1 are evidence against H₀: when group means truly differ, MS_between inflates (it reflects both within-group noise and real group differences) while MS_within remains anchored to within-group variability only.
Question 5 Short Answer
Why can a large F-statistic coexist with small actual differences between group means?
Think about your answer, then reveal below.
Model answer: The F-statistic is a ratio of between-group variance to within-group variance. If within-group variability is very small — observations cluster tightly around their own group means — then even modest differences between group means produce a large F. F measures signal relative to noise, not absolute effect size. With large samples or low within-group scatter, even trivially small group differences can yield a statistically significant F.
This is why reporting effect sizes (like η² = SS_between / SS_total) alongside F and p-values matters. A significant F tells you the difference is unlikely to be due to chance; it does not tell you whether the difference is large enough to matter practically. A F-test with a tiny practical effect can still be statistically significant with sufficient sample size.