Questions: Treatment Effect Heterogeneity and Conditional Average Treatment Effects
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A randomized trial of a new medication finds an ATE of +3 points on a symptom scale (p < 0.001). A policymaker concludes the medication should be given to all patients. What information is missing from this conclusion?
AThe ATE is always the right summary — randomized trials give unbiased causal estimates, so no further analysis is needed
BThe ATE may mask substantial heterogeneity: the drug might have large benefits for some subgroups and zero or negative effects for others, making universal prescription suboptimal
CThe ATE cannot be trusted unless the study included at least 10,000 patients
DThe ATE estimate is biased without propensity score adjustment in a randomized trial
An ATE of +3 could reflect a +10 effect for 30% of patients and zero effect for the other 70% — or even large negative effects for some subgroups masked by large positives for others. Knowing who benefits is essential for targeting treatment. Heterogeneity analysis (CATE estimation) is the tool for uncovering this. Options A misses the point: even unbiased ATEs average over heterogeneous individuals. Options C and D are wrong: sample size and propensity scoring are irrelevant to the heterogeneity question.
Question 2 Multiple Choice
A researcher uses a causal forest to discover that a job training program substantially benefits workers over 40 but has no effect on workers under 30. The analysis uses the full dataset. What is the most important next step before acting on this finding?
AReport the finding immediately — machine learning methods like causal forests are designed to control for overfitting
BValidate the subgroup finding on a held-out sample or new study, because exploratory CATE estimates are vulnerable to overfitting and spurious patterns
CUse a larger set of covariates to confirm that age is the key moderator
DSwitch to a linear regression with an age × treatment interaction to confirm the causal forest result
Causal forests reduce overfitting through honest splitting, but any subgroup finding discovered in-sample is still vulnerable to false positives — particularly when many potential moderators are explored. The core principle is that exploratory CATE findings require out-of-sample validation before being treated as established. The causal forest output is a hypothesis, not a confirmed result. Larger covariate sets (option C) worsen the overfitting problem; linear regression (option D) is a useful sensitivity check but doesn't substitute for replication.
Question 3 True / False
The Local Average Treatment Effect (LATE) estimated by instrumental variables is a specific form of treatment effect heterogeneity — it estimates the causal effect for one particular subpopulation.
TTrue
FFalse
Answer: True
LATE is the treatment effect for 'compliers' — individuals who switch treatment status in response to the instrument. Always-takers and never-takers are excluded because the instrument doesn't change their treatment. The LATE may differ substantially from the ATE if compliers are systematically different from the broader population. This makes IV estimates an example of treatment effect heterogeneity: the effect estimate is implicitly conditioned on a specific subgroup, not the full population.
Question 4 True / False
Finding that a treatment effect estimate is larger for women than men in an exploratory subgroup analysis is sufficient evidence to conclude there is genuine treatment effect heterogeneity.
TTrue
FFalse
Answer: False
Exploratory subgroup differences are vulnerable to overfitting, especially when many subgroups are examined without pre-specification. A difference found in-sample may reflect noise rather than true heterogeneity. To conclude genuine heterogeneity exists, the finding should be pre-specified, replicated in held-out data or an independent study, or confirmed using robust CATE methods with appropriate cross-validation. A single exploratory comparison — even one with a nominally significant interaction — is insufficient.
Question 5 Short Answer
Why might policymakers need CATE estimates rather than just the ATE, even when the ATE is positive and statistically significant?
Think about your answer, then reveal below.
Model answer: A positive ATE tells policymakers the treatment works on average, but not who benefits. If effects are heterogeneous — large benefits for some groups, no effect or harm for others — universal deployment misallocates resources and may expose non-beneficiaries to costs or side effects without gain. CATE estimates identify which subgroups drive the effect, allowing targeted deployment: treat those with high predicted benefit, withhold treatment from those with near-zero or negative predicted effects. This is especially important when treatment is costly, has side effects, or when only a subset of the population will be targeted by an intervention anyway.
The distinction between ATE and CATE is essentially the difference between 'does this work?' and 'for whom does this work?' Policy design often requires the latter. An average effect is a useful starting point but not a sufficient basis for individualized treatment decisions or efficient resource allocation.