Questions: Causal Inference from Observational Data
3 questions to test your understanding
Score: 0 / 3
Question 1 Multiple Choice
A researcher adds a new control variable to a regression model. The variable is a collider — it is caused by both the treatment variable and the outcome variable. What happens to the causal estimate?
AIt improves, because more variance is explained
BIt is unaffected, because colliders are neutral controls
CIt becomes biased, because conditioning on a collider opens a previously blocked non-causal path
DThe standard errors decrease, making the estimate more reliable
Conditioning on a collider opens a backdoor path between treatment and outcome that was previously blocked, inducing a spurious correlation and biasing the causal estimate. This is the key reason 'add more controls' is not always the right strategy. Proper causal analysis requires mapping the data-generating process (via a DAG) before deciding what to condition on.
Question 2 True / False
If a study finds a statistically significant association between X and Y after controlling for most available observed confounders, we can conclude that X causes Y.
TTrue
FFalse
Answer: False
Unconfoundedness — the assumption that no unmeasured confounders exist — cannot be verified from data alone. Even after controlling for every observed covariate, unmeasured variables may still confound the relationship. Causal identification requires a credible design argument about the data-generating process, not just statistical significance after observed controls.
Question 3 Short Answer
What is the potential outcomes framework, and why is the fundamental problem of causal inference called 'fundamental'?
Think about your answer, then reveal below.
Model answer: The potential outcomes framework defines the causal effect of treatment T on unit i as Y_i(1) − Y_i(0): what would happen if treated minus what would happen if untreated. It is called fundamental because we can only ever observe one of these two outcomes for any individual — the other is an unobservable counterfactual. This is a logical impossibility, not a data limitation, and cannot be solved by collecting more observations on the same unit.
The impossibility of observing both potential outcomes simultaneously is not a measurement problem — it is a structural fact about causality. All causal inference strategies (randomization, instrumental variables, difference-in-differences, regression discontinuity) are attempts to credibly estimate the missing counterfactual under different sets of identifying assumptions. Understanding this frames why each design requires its own specific assumptions.