Questions: Causal Inference and the Identification Problem
3 questions to test your understanding
Score: 0 / 3
Question 1 Multiple Choice
A study finds a strong positive correlation between the number of hospitals in a city and its death rate. A researcher controls for city size and still finds the relationship. What is the most likely explanation?
AHospitals cause death — patients should avoid them
BSelection bias: sicker people travel to cities with more hospitals, so the correlation reflects who chooses to go there, not the effect of hospitals on health
CThe regression controls have solved the identification problem
DCity size is the only confounder, so the controlled estimate is causal
This is a classic selection bias example. People who are severely ill seek out cities with major medical centers. The association picks up who selects into treatment (going to a hospital-dense city), not the causal effect of hospitals. Even with controls, if unobserved illness severity drives both location choice and death risk, the estimate remains confounded.
Question 2 True / False
Adding more control variables to a regression generally gets you closer to estimating a causal effect.
TTrue
FFalse
Answer: False
Controls help only when they block backdoor paths between treatment and outcome. Controlling for a 'bad control' — a variable that is itself caused by the treatment, or a collider — can introduce new bias and move the estimate further from the truth. Identification is about the source of variation in the regressor, not the number of variables in the model.
Question 3 Short Answer
Why do economists rely on 'natural experiments' rather than simply adding more control variables to estimate causal effects?
Think about your answer, then reveal below.
Model answer: Natural experiments provide quasi-random variation in the treatment — like a policy that affected only one group — which breaks the link between treatment and unobserved confounders. Controls alone cannot eliminate bias from variables that were never measured.
The fundamental identification problem is that unobserved variables may simultaneously affect treatment status and the outcome. No set of observed controls can block these unobserved backdoor paths. A natural experiment (e.g., a lottery, a policy cutoff, a geographic boundary) generates variation in treatment that is, by design or circumstance, unrelated to potential outcomes — making the 'as-good-as-random' assumption defensible.