A researcher studies student test scores from 50 schools and estimates the effect of a tutoring program using standard OLS regression that ignores school membership. What is the most likely statistical consequence?
ACoefficient estimates will be biased toward zero because the tutoring effect is diluted across schools
BStandard errors will be artificially small, leading to inflated test statistics and confidence intervals that are too narrow
CThe model will fail to converge because clustering violates the computational assumptions of OLS
DCoefficient estimates will be too large because schools with more students receive excess influence
Ignoring clustering violates the independence assumption of OLS. Students within the same school share environments, teachers, and resources — they are more correlated with each other than with students in other schools. This within-cluster correlation means the 'effective sample size' is smaller than the nominal N. OLS treats all N observations as independent, underestimates true standard errors, and overstates precision. The result is inflated test statistics and too-narrow confidence intervals — an elevated false positive rate. Coefficient point estimates may not be biased, but inference about them will be unreliable.
Question 2 Multiple Choice
A researcher computes the intraclass correlation coefficient (ICC) for patient mortality across 30 hospitals and finds ICC = 0.25. What is the correct interpretation?
AThe pairwise correlation between any two patients' mortality outcomes within the same hospital is 0.25
B25% of the total variation in mortality outcomes is attributable to which hospital a patient is in — clustering is substantial and ignoring it will bias inference
CThe multilevel model explains 25% of the mortality variance; the remaining 75% is unexplained
D25% of hospitals in the study have statistically significantly above-average mortality rates
The ICC is the proportion of total outcome variance attributable to between-cluster differences. ICC = 0.25 means that 25% of the variation in mortality is explained simply by knowing which hospital a patient is in — a very large clustering effect. As a rule of thumb, ICC > 0.05 warrants a multilevel model; ICC = 0.25 makes it mandatory. Option A is close but slightly wrong: ICC measures the expected correlation between two randomly chosen individuals from the same cluster, not a simple pairwise correlation — though numerically they are equivalent in the simple two-level model.
Question 3 True / False
Partial pooling in a hierarchical model produces better small-cluster estimates than estimating each cluster completely independently (no pooling).
TTrue
FFalse
Answer: True
True — when a cluster has few observations, its independent (no-pooling) estimate is highly unstable and driven by noise. Partial pooling shrinks the cluster's estimate toward the overall mean, with the degree of shrinkage proportional to how few observations are in the cluster and how much clusters vary. For small clusters, this trades a small bias for a large reduction in variance, yielding a lower mean squared error. This is formalized in the James-Stein result: under squared error loss, shrinkage estimators dominate independent estimation when there are many groups.
Question 4 True / False
If the intraclass correlation coefficient for a dataset is 0.02, using a multilevel model instead of ordinary regression will substantially change the study's conclusions.
TTrue
FFalse
Answer: False
False — when ICC is near zero, almost no variation in the outcome is attributable to cluster membership. The observations within clusters are barely more correlated than observations from different clusters. In this case, OLS standard errors will be approximately correct and the inferential gap between OLS and multilevel modeling will be negligible. The practical rule of thumb is ICC > 0.05 warrants the multilevel approach. ICC = 0.02 indicates clustering is unlikely to meaningfully bias inference, making the added model complexity unnecessary.
Question 5 Short Answer
In your own words, explain what 'partial pooling' means in a hierarchical model and why it produces better estimates than either complete pooling or no pooling for clustered data.
Think about your answer, then reveal below.
Model answer: Partial pooling means cluster-specific estimates are pulled toward the overall mean rather than being estimated either all-identically (complete pooling) or fully independently (no pooling). The degree of shrinkage depends on cluster size and between-cluster variance: large, information-rich clusters are barely shrunk; small clusters are pulled substantially toward the global mean. Complete pooling ignores genuine between-cluster differences. No pooling gives unstable, noisy estimates for small clusters. Partial pooling navigates between these extremes, yielding better estimates by borrowing strength from the full dataset without erasing real cluster differences.
The intuition: if a hospital has only 5 patients in your study, its raw observed mortality rate is mostly noise. Instead of reporting that noisy rate as-is (no pooling) or ignoring the hospital's identity entirely (complete pooling), partial pooling says 'your estimate is mostly the overall mean, adjusted a little toward your 5-patient observation.' As the cluster size grows, the observation dominates and the estimate converges to the no-pooling value. This is formally optimal under squared error loss for a broad class of models.