Questions: Multilevel Modeling for Hierarchical Data
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher collects data on 1,000 employees nested within 50 companies and runs ordinary linear regression to predict salary from performance ratings. What is the primary statistical problem with this approach?
ALinear regression cannot handle more than 500 observations reliably
BThe nested structure violates the independence assumption, causing standard errors to be underestimated and Type I error to be inflated
CPerformance ratings are ordinal, making linear regression mathematically invalid
DThe 50-company sample is too small to support any regression analysis
Ordinary regression assumes all observations are independent. Employees within the same company share a context — similar pay scales, culture, HR policies — so their outcomes are correlated. The model treats each employee as an independent draw, but 1,000 employees in 50 companies carry far less independent information than 1,000 truly independent individuals. The consequence: standard errors are underestimated (the model 'thinks' it has more independent information than it does), t-statistics are inflated, and effects appear more statistically significant than warranted. Multilevel modeling explicitly partitions within-company and between-company variance.
Question 2 Multiple Choice
A researcher adds random slopes for 'training hours' to her multilevel model. A colleague insists: 'Random slopes are always better — a model that lets relationships vary across groups is more realistic.' What is the correct response?
AThe colleague is right — random slopes always improve both model fit and realism
BRandom slopes are theoretically motivated when relationships genuinely vary, but consume degrees of freedom and can be poorly estimated with small group sizes — the decision should be driven by theory and sample size
CRandom slopes are only appropriate for longitudinal data, not cross-sectional nested data
DRandom slopes should only be added when the ICC exceeds 0.5
Adding random slopes does allow for more realistic variation, but at a real cost. Random slopes require sufficient within-group variation in the predictor AND sufficient numbers of groups to estimate the slope variance reliably. With small group sizes or few groups, random slope estimates become unstable or cause model convergence failures. The 'always add random slopes' heuristic is a common overcorrection. The decision should be: does theory predict the relationship varies across groups? Does the sample have sufficient power to estimate that variation? Model complexity should be earned, not assumed.
Question 3 True / False
A high intraclass correlation (ICC) indicates that knowing which group an individual belongs to substantially reduces uncertainty about their outcome, even before any predictors are added to the model.
TTrue
FFalse
Answer: True
The ICC is the proportion of total variance attributable to group membership. An ICC of 0.30 means 30% of outcome variance is explained simply by which group a person is in — before any individual-level predictors. Intuitively, a high ICC means groups differ greatly from each other relative to within-group spread, so group membership is highly informative. It simultaneously quantifies the severity of the independence assumption violation: a high ICC means observations within groups are strongly correlated, which is precisely the structure that ordinary regression ignores.
Question 4 True / False
A near-zero ICC means the data have negligible clustering, so it is typically safe to use ordinary regression without multilevel corrections.
TTrue
FFalse
Answer: False
Even a small ICC can produce consequential misestimation of standard errors when group sizes are large — because the total non-independence accumulates across many observations within groups. A seemingly small ICC of 0.05 with 50 people per group produces a design effect of approximately 1 + (50−1)×0.05 = 3.45, meaning effective sample size is less than a third of nominal. Whether the ICC is 'small enough to ignore' depends jointly on the ICC value and the group size. Moreover, cross-level interactions — often the theoretically most interesting estimates — require the multilevel framework regardless of the ICC.
Question 5 Short Answer
What is a cross-level interaction in a multilevel model? Use a concrete example to explain why it cannot be properly estimated in ordinary single-level regression.
Think about your answer, then reveal below.
Model answer: A cross-level interaction asks whether the effect of a Level 1 (individual-level) predictor on the outcome depends on a Level 2 (group-level) characteristic. Example: does the effect of attending a tutoring program (individual-level) on test scores vary depending on school funding levels (school-level variable)? In a multilevel model, the individual-level slope for tutoring is modeled as a function of the school's funding — a group-level moderator of an individual-level relationship. In ordinary single-level regression, you could manually multiply tutoring by school funding and include it as an interaction term, but this approach doesn't correctly partition within-school and between-school variance, producing biased standard errors for the interaction. The multilevel framework is needed because the cross-level interaction involves variance components at two distinct levels — collapsing them into a single-level analysis conflates the two sources of variance and yields misleading inferences.