After a propensity-score-matched study finds no difference in mortality between treated and untreated groups, a reviewer argues that propensity scoring 'essentially randomized' the groups so the result can be trusted as causal evidence. What is the most important flaw in this reasoning?
APropensity scores cannot be used for binary outcomes like mortality
BMatching reduces sample size and thus statistical power
CPropensity scores only balance measured covariates, leaving unmeasured confounding intact
DThe propensity model must be estimated with logistic regression specifically
Unlike randomization — which balances both observed and unobserved covariates by design — propensity score methods only control for confounders that were measured and included in the model. Unmeasured confounding remains fully intact regardless of how well the propensity model is specified. This is the fundamental limitation that prevents propensity score analyses from claiming the same causal guarantees as an RCT.
Question 2 Multiple Choice
A researcher assesses covariate balance before and after propensity-score matching using p-values from t-tests, declaring 'balance achieved' when most p-values become non-significant. What is wrong with this approach?
ABalance should be assessed using standardized mean differences, not p-values
BMatching can only be validated using propensity score distribution histograms
COnly the treated group should be checked for balance after matching
DP-values are appropriate for continuous covariates but not for categorical ones
P-values are sensitive to sample size, not just covariate imbalance. After matching, sample size is typically reduced, which inflates p-values even when meaningful imbalance remains. Standardized mean differences (SMDs) quantify the magnitude of imbalance independent of sample size and are the appropriate tool for assessing whether matching succeeded in creating comparable groups.
Question 3 True / False
A propensity score model that perfectly predicts treatment assignment would be the ideal tool for causal inference in an observational study.
TTrue
FFalse
Answer: False
A perfect predictor of treatment assignment means there is no overlap — every subject is deterministically treated or untreated based on measured covariates. This violates the positivity assumption: the counterfactual outcome is unobservable for every person. Perfect prediction destroys the possibility of causal inference rather than enabling it. Causal estimation requires common support — regions of the covariate space where both treated and untreated individuals exist.
Question 4 True / False
In a propensity-score-matched analysis, covariate balance should be assessed after matching, not only before.
TTrue
FFalse
Answer: True
Pre-matching balance assessment describes the confounding problem; post-matching balance assessment tells you whether the solution worked. The purpose of matching is to create comparable groups, so checking balance after matching — using standardized mean differences across all covariates included in the propensity model — is the diagnostic that validates the analysis. Pre-match imbalance is expected; post-match imbalance indicates the matching failed.
Question 5 Short Answer
Why can propensity score methods never fully replicate the causal guarantees of a randomized controlled trial, no matter how well the propensity model is specified?
Think about your answer, then reveal below.
Model answer: Because propensity scores only balance covariates that were measured and included in the model. A randomized trial balances all confounders — measured and unmeasured — by design, because treatment assignment is independent of all subject characteristics. In an observational study, unmeasured variables (unknown to the researcher or simply not collected) may still differ between groups after propensity-score adjustment, leaving residual confounding that no statistical method can remove.
This is the irreducible limitation of all observational study designs. The quality of a propensity score analysis is bounded by the quality and completeness of the measured confounders. Sensitivity analyses (e.g., E-values) can quantify how strong unmeasured confounding would need to be to explain away a finding, but they cannot rule out its existence.