A researcher analyzes repeated binary outcomes (infection yes/no at each hospital visit) using both GEE with logistic link and a mixed-effects logistic model. The GEE odds ratio is 1.5 and the mixed-effects odds ratio is 2.1 for the same predictor. Which is correct?
AThe GEE estimate is correct; the mixed-effects model is biased
BBoth are correct but answer different questions — GEE estimates the population-average effect while the mixed-effects model estimates the subject-specific effect, and these differ for nonlinear models
CThe mixed-effects estimate is always larger, indicating it has more power
DThe discrepancy indicates model misspecification in one or both
For nonlinear models (logistic, Poisson), marginal and conditional effects differ mathematically. The GEE marginal odds ratio answers: if we compare two populations differing by one unit of X, what is the difference in average log-odds? The mixed-effects conditional odds ratio answers: for a given individual, what is the change in log-odds per unit of X? Subject-specific effects are larger than marginal effects because averaging over random heterogeneity attenuates the slope. Both are valid; the choice depends on whether the research question is about populations or individuals.
Question 2 Short Answer
GEE with an exchangeable working correlation structure and robust standard errors produces valid inference even if the true correlation structure is autoregressive. Why?
Think about your answer, then reveal below.
Model answer: The sandwich (robust) variance estimator corrects the standard errors by using the empirical residual covariance rather than the model-assumed covariance. Even if the working correlation is wrong, the point estimates remain consistent (they converge to the true marginal parameters as n grows), and the sandwich estimator produces standard errors that account for the actual correlation pattern in the data. The working correlation affects efficiency (better working correlations produce smaller standard errors) but not the validity of inference.
This robustness property is GEE's greatest strength. The working correlation is a computational device to improve efficiency, not a structural assumption required for validity. However, the sandwich estimator requires a sufficient number of independent clusters (conventionally 40+) to perform well — with few clusters, it can be anti-conservative, and small-sample corrections or bias-corrected sandwich estimators may be needed.
Question 3 True / False
GEE is preferred over mixed-effects models when the research question is about population-average effects, the number of clusters is large, and subject-specific predictions are not needed.
TTrue
FFalse
Answer: True
GEE is ideally suited for marginal inference: it estimates how the average outcome in a population changes with a covariate, with minimal distributional assumptions. It does not estimate random effects and therefore cannot produce subject-specific predictions. Mixed-effects models are preferred when the research question involves individual-level prediction, when the random effects distribution is of interest, or when the data are unbalanced with many measurements per subject. The choice between GEE and mixed-effects is driven by the research question, not by which produces the 'correct' answer.