A researcher matches treated and control participants on their propensity scores, achieves excellent balance on all measured covariates, and concludes she has eliminated confounding. What untestable assumption is her causal interpretation resting on?
AThat her logistic regression model for the propensity score has a sufficiently high C-statistic
BThat the treated and control groups are equal in size after matching
CThat there are no unobserved variables that predict both treatment selection and the outcome (conditional independence)
DThat propensity scores were estimated on the log-odds scale rather than the probability scale
Propensity score matching removes bias from *observed* confounders only. The conditional independence assumption — also called ignorability — states that treatment assignment is independent of potential outcomes given the observed covariates. If unobserved variables (e.g., motivation, social connections) predict both who receives treatment and what outcomes they would have, those biases persist after matching. This assumption is untestable from the data itself, which is why sensitivity analysis (Rosenbaum bounds) is essential: it asks how strong an unobserved confounder would need to be to overturn the conclusion.
Question 2 Multiple Choice
A propensity score model includes many covariates and achieves excellent predictive accuracy (AUC = 0.94) but shows poor covariate balance in diagnostic checks. What should the researcher conclude and do?
AThe model is excellent; AUC of 0.94 indicates strong causal identification
BThe goal of propensity score estimation is covariate balance, not predictive accuracy — high AUC can indicate near-perfect separation that makes matching impossible, so the model should be revised
CSwitch to inverse probability weighting, which performs better when AUC is high
DAdd more covariates to increase AUC toward 1.0 for better balance
This is the central paradox of propensity score estimation. A high-AUC model means the model can perfectly predict who received treatment — which often means propensity scores are pushed toward 0 and 1 (near-perfect separation). These extreme scores create poor overlap between treated and control groups, making good matches impossible and IPW unstable. The goal is adequate covariate balance (measured by standardized mean differences), not predictive accuracy. Sometimes dropping covariates that perfectly predict treatment improves balance and downstream causal inference.
Question 3 True / False
By default, propensity score matching estimates the average treatment effect on the treated (ATT) — the effect for those who actually received treatment — rather than the average treatment effect (ATE) for the full population.
TTrue
FFalse
Answer: True
1:1 nearest-neighbor matching finds comparison units for treated units, which means the estimand is naturally the ATT: what was the effect of treatment for the treated group? This is often the policy-relevant question ('was this program effective for the people it actually served?') but it is not the same as ATE ('what would happen if we assigned everyone to treatment?'). To estimate ATE, you need comparison units that represent the untreated population, which may require different matching strategies or IPW targeting ATE rather than ATT.
Question 4 True / False
Propensity score matching eliminates the need for sensitivity analysis because conditioning on observed covariates removes the threat of hidden confounding.
TTrue
FFalse
Answer: False
This is the most dangerous misconception about propensity scores. Conditioning on observed covariates only balances those covariates — it cannot address unobserved variables that predict both treatment and outcome. The conditional independence assumption (no hidden confounders) is untestable from the data, so the researcher cannot know whether it holds. Sensitivity analysis (Rosenbaum bounds) is essential precisely because it quantifies how much hidden bias would be needed to overturn the finding. A fragile finding — one that would be overturned by a small unobserved confounder — should be interpreted with much more caution.
Question 5 Short Answer
Why is the goal of propensity score estimation covariate balance rather than predictive accuracy, and how does this affect model-building strategy?
Think about your answer, then reveal below.
Model answer: The purpose of the propensity score is not to predict treatment accurately but to create a matched or weighted sample in which treated and control groups look similar on observed covariates — mimicking the balance that randomization would produce. A high-accuracy model can actually hurt this goal if it achieves accuracy by identifying variables that perfectly separate treated from control units (poor overlap), making good matches impossible. Model-building strategy should be guided by balance diagnostics: estimate propensity scores, check standardized mean differences before and after matching, revise the model if balance is inadequate, and iterate. The model is a means to balance, not an end in itself.
This reframing — from 'predict treatment' to 'achieve balance' — changes what 'a good model' means entirely. Adding irrelevant predictors that happen to separate groups will increase AUC but harm balance and downstream causal inference. The workflow is diagnostic and iterative: balance checks should be run after each modeling choice, not just at the end.