A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Difference-in-Differences

College Depth 118 in the knowledge graph ☐ I know this ☆ Set as goal

48topics build on this

660prerequisites beneath it

Causal Inference and the Identification Problem Dummy Variables and Categorical Regressors +4 more→→Interrupted Time Series Design Parallel Trends Assumption: Validity and Testing +2 more

Core Idea

Difference-in-differences (DiD) estimates causal treatment effects by comparing the pre-to-post change in the treatment group to the pre-to-post change in a comparison group. The estimator is β̂_DiD = (Ȳ_treated,post − Ȳ_treated,pre) − (Ȳ_control,post − Ȳ_control,pre), which differences out both pre-existing differences and aggregate time trends. The critical identifying assumption is parallel trends: in the absence of treatment, the treatment and control groups would have followed the same trajectory. This assumption is untestable at the exact period of treatment but is supported by showing parallel pre-trends in the data.

How It's Best Learned

Replicate Card and Krueger's (1994) minimum wage study using New Jersey and Pennsylvania as treatment and control — this is the canonical DiD application in labor economics.

Common Misconceptions

Parallel trends is an assumption about counterfactual outcomes, not about the levels of outcomes before treatment — groups can differ in levels.
With staggered treatment timing across units, simple two-way FE DiD can be biased; recent 'heterogeneous treatment effects' literature addresses this.

Explainer

You already know from the potential outcomes framework that the fundamental problem of causal inference is that we can never observe the same unit in both the treated and untreated states at the same time. The naive fix — compare treated units to untreated units after treatment — fails because the groups may differ for reasons unrelated to treatment. Difference-in-differences solves this by using time to construct the missing counterfactual. Instead of asking "what would the treated group have looked like untreated?", DiD asks "how did the treated group's trajectory differ from the control group's trajectory during the same period?"

The estimator is literally two differences stacked. First, you difference within each group: compute the before-to-after change for the treatment group and the before-to-after change for the control group. Then you difference the two differences. This double-differencing cancels out anything that was stable over time within each group (fixed differences in levels) and anything that affected both groups equally across time (common time trends). What remains is the portion of the treated group's change that cannot be explained by the time trend alone — the treatment effect.

The identifying assumption — parallel trends — is the load-bearing pillar of every DiD study. It says that in the absence of treatment, the treatment and control groups would have moved in parallel over time. This is explicitly a claim about a counterfactual you cannot observe. What you can do is check pre-treatment periods: if the two groups were trending in parallel before treatment, it is more plausible they would have continued to do so. The canonical example, Card and Krueger (1994), compared fast-food employment in New Jersey (raised minimum wage) and Pennsylvania (did not) before and after the policy change. The DiD estimate found no negative employment effect — a landmark result precisely because the research design was credible.

In practice, DiD is implemented as a regression. You create a treatment dummy (1 = treatment group), a post dummy (1 = after treatment), and their interaction. The coefficient on the interaction term is the DiD estimate. This connects directly to your dummy variable knowledge: the interaction isolates the group-period cell where treatment occurred. Adding additional covariates and unit fixed effects (which you know from fixed effects models) further controls for confounders and absorbs unit-level heterogeneity, strengthening the design. The key caution from the misconceptions is worth internalizing: parallel trends is a claim about counterfactual *trends*, not about levels — the groups can be very different in absolute terms before treatment begins.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Independence of Events → Sampling Distributions → Standard Error of Estimators → Hypothesis Testing: Framework and Logic → P-values and Statistical Significance → Effect Size and Practical Significance → Hypothesis Testing: Framework and Logic → Z-Tests and T-Tests for Means → One-Sample Z-Test for Means → One-Sample and Two-Sample T-Tests → Inference in Linear Regression → Prediction Intervals in Regression → Linear Regression Basics → Residuals and Goodness of Fit (R²) → Simple (Bivariate) OLS Regression → Classical OLS Assumptions (Gauss-Markov) → Multiple Regression → Interpreting Regression Coefficients → Hypothesis Testing in Regression → F-Test and Joint Significance → R-Squared and Model Fit → Multicollinearity → Robust Standard Errors → Panel Data: Structure and Advantages → Fixed Effects Models → Difference-in-Differences

Longest path: 119 steps · 660 total prerequisite topics

Prerequisites (6)

Causal Inference and the Identification Problemhard Potential Outcomes and the Rubin Causal Modelhard Dummy Variables and Categorical Regressorshard Fixed Effects Modelssoft Selection Biassoft Causal Inference from Observational Datasoft

Leads To (4)

Interrupted Time Series Designsoft Parallel Trends Assumption: Validity and Testinghard Synthetic Control Methods for Policy Evaluationsoft Synthetic Control and Comparative Case Studiessoft