Questions: Matching, Stratification, and Weighting: Creating Comparable Groups
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher uses propensity score matching to compare earnings outcomes between participants and non-participants in a job training program. After matching, the treated and control groups are nearly identical on age, education, prior earnings, and region. The researcher concludes she has obtained an unbiased causal estimate. What is the most serious threat to this conclusion?
AThe propensity score model might be misspecified, assigning slightly wrong weights
BUnmeasured confounders — such as motivation or employer discrimination — could still differ between groups, causing bias that matching on observed covariates cannot remove
CThe sample size is too small to detect effects reliably
DPropensity score matching cannot be used for labor market studies; it is designed for medical trials
Matching and weighting methods identify causal effects only under the unconfoundedness assumption: all variables that jointly influence treatment selection and the outcome must be measured and conditioned on. Even perfect covariate balance on observed covariates does not help if important confounders — like motivation, social networks, or employer attitudes — were never measured. These unmeasured variables remain as selection bias. This is the fundamental limitation these methods cannot overcome by design: they are as good as the covariate set you bring to them. Options A, C, and D identify real concerns but are less severe than unmeasured confounding.
Question 2 Multiple Choice
Inverse probability weighting (IPW) and exact matching both aim to estimate causal effects from observational data. What is the key difference in how they create comparability?
AIPW requires a randomized experiment; exact matching works on observational data
BExact matching selects a subset of similar paired units; IPW reweights the full sample so the control group's covariate distribution resembles the treated group
CIPW can only be used for binary outcomes; matching works for continuous outcomes
DExact matching uses the propensity score as a summary; IPW requires matching on each covariate individually
Exact matching and IPW differ in strategy, not goal. Exact matching finds individual control units who are similar to each treated unit and pairs them — discarding unmatched cases. IPW keeps the entire sample but assigns weights so that control units who 'look like' treated units receive high weight in the analysis and those who don't receive low weight. Both rest on the same identifying assumption (unconfoundedness), but they use different portions of the data and make different trade-offs between efficiency and bias. Option D reverses the description — propensity score matching uses the propensity score as a summary, while exact matching matches on individual covariates.
Question 3 True / False
After successfully matching treated and control units on all measured covariates using propensity score matching, the resulting estimate may still be biased if important confounders were not measured before the study.
TTrue
FFalse
Answer: True
This is the central limitation of all observational matching methods. Covariate balance after matching confirms that the groups are comparable on *measured* variables, but unmeasured confounders remain unaffected. The unconfoundedness assumption — that conditioning on observed covariates is sufficient to remove selection bias — cannot be tested from the data; it is a substantive claim about the data-generating process. If a researcher forgot to measure motivation, social networks, or any variable correlated with both treatment and outcome, the causal estimate absorbs that bias regardless of how well the matching performed on measured variables.
Question 4 True / False
Achieving good covariate balance after propensity score matching is sufficient to verify that the unconfoundedness assumption holds.
TTrue
FFalse
Answer: False
Covariate balance (treated and control groups being similar on measured covariates) is a diagnostic for the *quality of the matching*, not a test of unconfoundedness. Unconfoundedness requires that *all* confounders — measured and unmeasured — are balanced. Balance checks can only speak to measured variables. The unconfoundedness assumption is fundamentally untestable from observed data alone: you cannot check whether an unmeasured variable you didn't collect is unbalanced. Good balance after matching is necessary but not sufficient — it must be supplemented by substantive arguments that the measured covariate set is complete.
Question 5 Short Answer
Why is the unconfoundedness assumption (also called ignorability) the central identifying assumption in matching and weighting methods, and why can it not be tested using the observed data?
Think about your answer, then reveal below.
Model answer: Unconfoundedness requires that all variables jointly influencing treatment selection and the outcome have been measured and included in the covariate set. Without this, selection bias remains in the comparison even after perfect covariate balance on measured variables — unmeasured confounders still create systematic differences between groups. It cannot be tested from observed data because testing it would require knowing the potential outcomes for units under conditions they did not experience (the fundamental problem of causal inference). You can check balance on measured covariates, conduct sensitivity analyses, or argue substantively that the covariate set is complete, but no statistical test can verify the assumption from the data alone.
Students often think that a good propensity score model and post-matching balance checks are sufficient validation of the causal design. The key insight is that matching 'buys' comparability only on what was measured — it cannot help with what was never observed. The assumption is substantive, not statistical, and its plausibility depends entirely on domain knowledge about what drives treatment selection.