Questions: Potential Outcomes and the Rubin Causal Model
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Workers who completed a job training program earn $5,000 more per year than workers who did not. A researcher concludes the program raises wages by $5,000. The potential outcomes framework reveals this inference is flawed because:
AThe sample size is probably too small to support a $5,000 estimate with statistical significance
BWorkers who self-selected into training likely would have earned more anyway — the observed gap reflects both the program's effect and pre-existing differences between groups
CThe ATE and ATT are equal in labor market studies, so the $5,000 applies only to the treated
DThe estimate should be computed in log wages to avoid bias in the levels comparison
This is the selection bias problem. Workers who choose to enroll in job training are systematically different from those who don't — they may be more motivated, more educated, or more connected to labor markets. The potential outcomes decomposition reveals: observed gap = ATT + selection bias, where selection bias = E[Y(0)|D=1] − E[Y(0)|D=0]. If trainees would have earned more even without the program (positive selection bias), naive comparison overstates the effect. The $5,000 gap conflates the program's true impact with pre-existing differences — the fundamental problem that causal inference methods are designed to solve.
Question 2 Multiple Choice
In a well-executed randomized controlled trial, the observed difference in outcomes between treated and control groups estimates the ATE. Why does randomization make this possible?
ARandomization makes the treated and control groups the same size, eliminating statistical bias
BRandomization ensures that potential outcomes are independent of treatment assignment, so the control group's average untreated outcome equals what the treated group's untreated outcome would have been
CRandomization eliminates measurement error in the outcome variable
DRandomization forces the ATE and ATT to be equal by design
Randomization achieves {Y(0), Y(1)} ⊥ D — potential outcomes are independent of which group you were assigned to. This means E[Y(0)|D=1] = E[Y(0)|D=0]: the average untreated potential outcome is the same for both groups. Selection bias is therefore zero by construction, and the observed difference in means recovers the ATE. Without randomization, treated units differ systematically from untreated units in their potential outcomes (that's selection bias), making naive comparison misleading. Randomization destroys this systematic relationship by assigning treatment independently of any individual characteristic.
Question 3 True / False
The fundamental problem of causal inference is primarily a statistical problem — with a large enough sample, we can observe both Y_i(1) and Y_i(0) for the same individual and compute the individual treatment effect directly.
TTrue
FFalse
Answer: False
The fundamental problem is logical, not statistical — no sample size resolves it. Y_i(1) and Y_i(0) are mutually exclusive: a person either receives treatment or they don't at a given point in time. You observe one potential outcome; the other is counterfactual, existing only in a hypothetical world where the treatment assignment was different. This is a missing data problem at the unit level. Statistics helps estimate population averages (ATE, ATT), but individual treatment effects remain permanently unobservable. Increasing sample size gives better estimates of averages — it does not allow you to observe the counterfactual outcome for any individual.
Question 4 True / False
The Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT) answer different policy questions and can differ substantially in observational studies.
TTrue
FFalse
Answer: True
ATE = E[Y(1) − Y(0)] averages over the entire population; ATT = E[Y(1) − Y(0) | D=1] averages only over those who actually received treatment. If a job training program is effective specifically for the kind of motivated workers who self-select into it, but would be less effective for the broader population, then ATT > ATE. Policy questions determine which estimand is relevant: if you want to know the effect of mandating the program for everyone, you care about ATE; if you want to know whether it's worth continuing for current participants, you care about ATT. IV methods often estimate LATE (Local ATE) — a third estimand covering only the 'complier' subpopulation — which is different still.
Question 5 Short Answer
What is selection bias in the potential outcomes framework, and why does it cause naive comparisons of treated and untreated groups to give misleading estimates of treatment effects?
Think about your answer, then reveal below.
Model answer: Selection bias is the difference in untreated potential outcomes between those who received treatment and those who did not: E[Y(0)|D=1] − E[Y(0)|D=0]. It measures whether the groups would have differed in outcomes even without the treatment. When treatment is self-selected (not random), people who opt in typically differ systematically from those who don't — in motivation, health, socioeconomic status, or other factors. Naive comparison of treated vs. untreated outcomes conflates the treatment effect with these pre-existing differences, producing biased estimates. Positive selection bias (treated units have higher untreated potential outcomes) causes the observed gap to overstate the true effect.
The potential outcomes decomposition makes this concrete: E[Y|D=1] − E[Y|D=0] = ATT + (E[Y(0)|D=1] − E[Y(0)|D=0]). The second term is selection bias — present whenever treatment assignment is correlated with potential outcomes. Randomization eliminates it by making treatment assignment independent of potential outcomes. Quasi-experimental methods (DiD, RD, IV) achieve the same thing by exploiting variation in treatment that is 'as good as random' for a particular subpopulation. Understanding selection bias is what motivates the entire causal inference toolkit — every method is a different strategy for eliminating this one term from the decomposition.