A regression model predicts income from years of education. The coefficient on education is 4,200, with p < 0.001. What can you correctly conclude?
AEach additional year of education causes a $4,200 increase in income
BThere is a statistically significant positive association between education and income in the sample
CEducation accounts for 42% of the variation in income
DThe model is well-specified because the coefficient is significant
Statistical significance tells you the association is unlikely to be due to chance in the sample, not that education causes higher income. Causation requires ruling out confounders and ideally experimental or quasi-experimental designs. Options A, C, and D each reflect common over-interpretations of OLS output.
Question 2 True / False
A regression model with R² = 0.85 is generally preferable to one with R² = 0.45 for making causal inferences in social science.
TTrue
FFalse
Answer: False
R² measures how much variance the model explains, not whether the model is correctly specified or whether coefficients have causal interpretations. Adding irrelevant variables always increases R², and a model with many controls can have high R² while introducing collider bias or multicollinearity that undermines inference.
Question 3 Short Answer
What is the difference between a confounder and a collider in regression, and why does controlling for a collider cause problems?
Think about your answer, then reveal below.
Model answer: A confounder is a variable that causally affects both the treatment and the outcome; controlling for it removes spurious association. A collider is a variable that is caused by both the treatment and outcome; controlling for it opens a spurious path between them, inducing bias where none existed.
Blindly adding controls to 'improve' a regression ignores the causal structure. Controlling for a collider — a variable that is a common effect of the predictor and outcome — creates a spurious correlation between them. This is why causal diagrams (DAGs) are valuable before specifying a regression model.