Questions: Multivariable Regression in Epidemiology
3 questions to test your understanding
Score: 0 / 3
Question 1 Multiple Choice
In a logistic regression of disease status (1 = case, 0 = control), the coefficient for a binary exposure variable is 0.69. After adjusting for confounders, what does this coefficient represent?
AThe adjusted risk difference for the exposure
BThe adjusted log odds ratio for the exposure
CThe probability of disease given the exposure
DThe adjusted relative risk for the exposure
Logistic regression models the log odds of the outcome. Coefficients are on the log-odds scale, so the coefficient of 0.69 is the natural log of the odds ratio. Exponentiating gives e^0.69 ≈ 2.0 — the adjusted odds ratio. Logistic regression does not directly yield risk differences or relative risks (though marginal methods can approximate them).
Question 2 True / False
Adding more covariates to a multivariable regression model usually improves confounding control and should be done whenever additional variables are available.
TTrue
FFalse
Answer: False
This is a common and consequential misconception. Including colliders (variables causally downstream of both exposure and outcome) opens backdoor paths and introduces bias. Adjusting for mediators blocks the causal pathway of interest, underestimating the total effect. Including many variables with sparse data causes overfitting and inflated standard errors. Model specification must be guided by a causal DAG or explicit epidemiologic reasoning, not by statistical availability.
Question 3 Short Answer
What is the primary goal of multivariable regression in a causal epidemiologic study, as opposed to a predictive modeling context?
Think about your answer, then reveal below.
Model answer: The goal is to obtain an adjusted, unconfounded estimate of the causal effect of a specific exposure on an outcome — not to maximize predictive accuracy or explain variance. The set of covariates to include is determined by the causal structure (e.g., via a DAG), not by model fit statistics.
Epidemiologic regression and predictive machine learning share tools but have different objectives. In causal epidemiology, including the wrong covariates (colliders, mediators) can bias the exposure coefficient even while improving model fit, so goodness-of-fit metrics are insufficient guides to covariate selection.