Reduced Form and First-Stage Equations

College Depth 88 in the knowledge graph I know this Set as goal
Unlocks 5 downstream topics
instrumental-variables reduced-form first-stage

Core Idea

The first-stage equation (X regressed on Z) is the reduced form for X, showing how exogenous variation in Z translates to variation in the endogenous X. Weak first-stage (low R² or t-statistics) indicates weak instruments; guidance suggests F-statistic > 10 for instrument strength diagnostics.

Explainer

From your study of two-stage least squares (2SLS), you know the core idea: when X is endogenous (correlated with the error in the structural equation), you find an instrument Z that affects X but has no direct effect on y. The first-stage equation is the regression of X on Z (and any other controls): X = π₀ + π₁Z + controls + v. This is sometimes called the reduced form for X because it expresses the endogenous variable purely as a function of exogenous variables — no endogenous regressors appear on the right-hand side.

The reduced form for y is what you get by substituting the first-stage relationship all the way through: regress y directly on Z and controls, bypassing X entirely. The coefficient on Z in this regression captures the total effect of the instrument on the outcome, working through X. The ratio of the reduced-form coefficient on Z to the first-stage coefficient on Z gives you the IV estimate of the structural effect of X on y — this is precisely the Wald estimator, and it makes the logic of instrumental variables transparent. The instrument only matters to the outcome because it shifts X; the IV estimate recovers the causal effect of X by scaling the outcome shift by the X shift.

Instrument strength is not a minor technical detail — it determines whether your IV estimates are reliable at all. A weak instrument is one where Z barely shifts X, meaning the first-stage F-statistic is small. The commonly cited threshold is F > 10 (from Staiger and Stock, 1997). When instruments are weak, even small violations of the exclusion restriction get amplified in the IV estimate, and finite-sample bias can be severe — the IV estimate may actually be worse than OLS. You can always check instrument strength simply by running the first-stage regression and examining the F-statistic on the excluded instruments. A large first-stage F is necessary but not sufficient for valid IV: the instrument must also satisfy the exclusion restriction (Z ↛ y except through X), which is a theoretical judgment, not testable when you have exactly one instrument.

The distinction between first-stage and reduced-form equations also clarifies the overidentification test (the topic this builds toward). When you have more instruments than endogenous variables, you can test whether all instruments give the same IV estimate — if they don't, at least one instrument may be violating the exclusion restriction. The reduced-form equations for each instrument must all point to the same structural coefficient for the overidentifying restrictions to hold. Thinking systematically in terms of first-stage and reduced-form regressions gives you a coherent framework for designing, diagnosing, and stress-testing any instrumental variables strategy.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionCentral Limit TheoremConfidence Intervals for MeansZ-Tests and T-Tests for MeansOne-Sample Z-Test for MeansOne-Sample and Two-Sample T-TestsOne-Way ANOVAF-Test and Joint SignificanceR-Squared and Model FitOmitted Variable BiasCausal Inference and the Identification ProblemPotential Outcomes and the Rubin Causal ModelSelection BiasInstrumental VariablesTwo-Stage Least Squares (2SLS)Reduced Form and First-Stage Equations

Longest path: 89 steps · 430 total prerequisite topics

Prerequisites (1)

Leads To (1)