← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Reduced Form and First-Stage Equations

College Depth 121 in the knowledge graph ☐ I know this ☆ Set as goal

5topics build on this

594prerequisites beneath it

See this on the map →

Two-Stage Least Squares (2SLS)→→Test of Overidentification: Hansen J-Test

Core Idea

The first-stage equation (X regressed on Z) is the reduced form for X, showing how exogenous variation in Z translates to variation in the endogenous X. Weak first-stage (low R² or t-statistics) indicates weak instruments; guidance suggests F-statistic > 10 for instrument strength diagnostics.

Explainer

From your study of two-stage least squares (2SLS), you know the core idea: when X is endogenous (correlated with the error in the structural equation), you find an instrument Z that affects X but has no direct effect on y. The first-stage equation is the regression of X on Z (and any other controls): X = π₀ + π₁Z + controls + v. This is sometimes called the reduced form for X because it expresses the endogenous variable purely as a function of exogenous variables — no endogenous regressors appear on the right-hand side.

The reduced form for y is what you get by substituting the first-stage relationship all the way through: regress y directly on Z and controls, bypassing X entirely. The coefficient on Z in this regression captures the total effect of the instrument on the outcome, working through X. The ratio of the reduced-form coefficient on Z to the first-stage coefficient on Z gives you the IV estimate of the structural effect of X on y — this is precisely the Wald estimator, and it makes the logic of instrumental variables transparent. The instrument only matters to the outcome because it shifts X; the IV estimate recovers the causal effect of X by scaling the outcome shift by the X shift.

Instrument strength is not a minor technical detail — it determines whether your IV estimates are reliable at all. A weak instrument is one where Z barely shifts X, meaning the first-stage F-statistic is small. The commonly cited threshold is F > 10 (from Staiger and Stock, 1997). When instruments are weak, even small violations of the exclusion restriction get amplified in the IV estimate, and finite-sample bias can be severe — the IV estimate may actually be worse than OLS. You can always check instrument strength simply by running the first-stage regression and examining the F-statistic on the excluded instruments. A large first-stage F is necessary but not sufficient for valid IV: the instrument must also satisfy the exclusion restriction (Z ↛ y except through X), which is a theoretical judgment, not testable when you have exactly one instrument.

The distinction between first-stage and reduced-form equations also clarifies the overidentification test (the topic this builds toward). When you have more instruments than endogenous variables, you can test whether all instruments give the same IV estimate — if they don't, at least one instrument may be violating the exclusion restriction. The reduced-form equations for each instrument must all point to the same structural coefficient for the overidentifying restrictions to hold. Thinking systematically in terms of first-stage and reduced-form regressions gives you a coherent framework for designing, diagnosing, and stress-testing any instrumental variables strategy.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Independence of Events → Sampling Distributions → Standard Error of Estimators → Hypothesis Testing: Framework and Logic → P-values and Statistical Significance → Effect Size and Practical Significance → Hypothesis Testing: Framework and Logic → Z-Tests and T-Tests for Means → One-Sample Z-Test for Means → One-Sample and Two-Sample T-Tests → Inference in Linear Regression → Prediction Intervals in Regression → Linear Regression Basics → Residuals and Goodness of Fit (R²) → Simple (Bivariate) OLS Regression → Classical OLS Assumptions (Gauss-Markov) → Multiple Regression → Interpreting Regression Coefficients → Hypothesis Testing in Regression → F-Test and Joint Significance → R-Squared and Model Fit → Omitted Variable Bias → Causal Inference and the Identification Problem → Potential Outcomes and the Rubin Causal Model → Selection Bias → Instrumental Variables → Instrumental Variables: Validity Assumptions → Two-Stage Least Squares (2SLS) → Reduced Form and First-Stage Equations

Longest path: 122 steps · 594 total prerequisite topics

Prerequisites (1)

Two-Stage Least Squares (2SLS)hard

Leads To (1)

Test of Overidentification: Hansen J-Testsoft