Difference-in-differences (DiD) estimates causal effects by comparing the change in outcomes over time between a group affected by a treatment or policy (treatment group) and a group not affected (control group). The treatment effect is the difference in the before-to-after change between groups: (Y_treatment_after - Y_treatment_before) - (Y_control_after - Y_control_before). DiD removes both time-invariant group differences (the treatment group may have been sicker all along) and common time trends (both groups may have been improving). The critical assumption is parallel trends: in the absence of treatment, both groups would have experienced the same change over time. DiD is widely used in health policy evaluation — assessing the effects of smoking bans, Medicaid expansions, or new hospital regulations — because these policies create natural experiments where randomization is impossible.
Many of the most important questions in health policy cannot be studied with randomized trials. You cannot randomly assign states to expand Medicaid, randomly impose smoking bans, or randomly close hospitals. But these policy changes create natural experiments — situations where some populations are exposed to a policy and others are not, with the timing and location of the change determined by political or administrative processes rather than by health characteristics. Difference-in-differences exploits this structure.
The DiD logic is simple but powerful. Compare the treatment group's outcome before and after the policy to get the within-group change. Do the same for the control group. Subtract. The first differencing (before vs. after) removes time-invariant differences between groups. The second differencing (treatment vs. control change) removes common time trends. What remains — the difference of differences — is attributable to the policy, provided the parallel trends assumption holds.
Consider evaluating a state-level smoking ban. You observe lung cancer rates in the ban state and several non-ban states for years before and after implementation. The ban state may have always had higher cancer rates (population differences) and cancer rates may have been declining nationally (secular trend). DiD removes both: (ban state change) minus (non-ban state change) = policy effect. If non-ban states' rates declined by 3% and the ban state's rates declined by 8%, DiD attributes the extra 5% to the ban.
The parallel trends assumption is the backbone of the method and deserves scrutiny. It states that without the policy, the treatment and control groups would have experienced the same change in outcomes over time. This is about trends, not levels — groups can start at different baselines. The assumption is supported (but not proven) by showing that pre-intervention trends were parallel. Event-study plots are the diagnostic standard: they show the treatment-control difference at each time point, with the intervention date as reference. Flat pre-intervention differences support the assumption; diverging pre-trends undermine it. Extensions like triple-difference (DDD), synthetic control methods, and staggered adoption designs address complications that arise when the simple two-group, two-period framework does not fit the data.