A propensity score model for statin use includes age, sex, cholesterol, diabetes, and smoking. After matching, treated and control groups have excellent balance on all five variables. Does this guarantee an unbiased treatment effect estimate?
AYes — if all included covariates are balanced, the comparison is as good as a randomized trial
BNo — balance on observed covariates does not ensure balance on unmeasured confounders (e.g., health consciousness, diet quality) that may still bias the estimate
CYes — propensity score matching eliminates all confounding by design
DNo — propensity score methods can never produce valid causal estimates
Propensity score methods achieve balance on observed covariates included in the propensity model. They cannot address unmeasured confounders — variables that affect both treatment and outcome but were not measured. If health-conscious people are more likely both to take statins and to have better outcomes (through diet, exercise, etc.), this confounding persists even after perfect propensity score matching. The no-unmeasured-confounders assumption is the fundamental limitation and is not testable from the data alone.
Question 2 Short Answer
IPTW creates a pseudo-population where treatment is independent of measured confounders. A treated subject with propensity score 0.9 receives a weight of 1/0.9 ≈ 1.11, while a treated subject with propensity score 0.1 receives a weight of 1/0.1 = 10. Why does the subject with the lower propensity get a higher weight?
Think about your answer, then reveal below.
Model answer: A treated subject with propensity score 0.1 is unusual — most similar people did not receive treatment. This subject is underrepresented among the treated relative to the overall population. Weighting by 1/0.1 = 10 means this subject 'represents' 10 similar people in the population, most of whom would not have been treated. The weighting creates a pseudo-population where treatment assignment is independent of the covariates, mimicking randomization. Subjects who are uncommon in their treatment group receive higher weights because they carry more information about what would happen if treatment assignment were random.
The intuition is analogous to survey weighting: if a demographic group is undersampled, each observed member of that group gets a higher weight to represent the unobserved members. IPTW does the same for treatment groups — making each group representative of the full population by upweighting subjects whose treatment status is surprising given their characteristics. Extreme weights (very high or very low propensity scores) can cause instability, which is why weight trimming or stabilized weights are often used.
Question 3 True / False
Propensity score matching and IPTW estimate the same causal effect.
TTrue
FFalse
Answer: False
Propensity score matching typically estimates the Average Treatment Effect on the Treated (ATT) — the effect among those who actually received treatment — because treated subjects are matched to similar controls. IPTW with standard weights estimates the Average Treatment Effect (ATE) — the effect if the entire population were treated versus untreated. With ATT weighting, IPTW can also estimate the ATT. The choice between ATT and ATE depends on the research question: ATT is relevant when you want to know whether treatment helped those who received it; ATE is relevant for policy decisions about treating everyone.