Endogeneity

College Depth 83 in the knowledge graph I know this Set as goal
Unlocks 44 downstream topics
endogeneity simultaneity measurement-error bias

Core Idea

Endogeneity is a general term for any situation where a regressor is correlated with the error term, making OLS biased and inconsistent. There are three main sources: omitted variable bias (a confound is excluded from the model), simultaneity (y and x are jointly determined, as when price and quantity are both endogenous in a supply-demand system), and measurement error (x is measured with noise, attenuating its coefficient toward zero via 'attenuation bias'). Endogeneity is the central identification problem in applied economics, and most advanced methods — instrumental variables, panel fixed effects, regression discontinuity, difference-in-differences — are designed to address specific forms of it.

How It's Best Learned

Work through three separate examples, one for each source of endogeneity, and derive the direction of bias for each. The supply-demand simultaneity example is essential for macroeconomics applications.

Common Misconceptions

Explainer

From your study of OLS assumptions, you know that the zero-conditional-mean assumption E(u|X) = 0 is what makes OLS unbiased. Endogeneity is the collective name for anything that violates this assumption — any situation where your regressor X is correlated with the error term u. When Cov(X, u) ≠ 0, OLS does not recover the true causal effect; instead it picks up a blend of the causal effect and the confounding relationship between X and u. The bias does not shrink with sample size — endogeneity is a consistency problem, not just a precision problem.

The three sources each have a distinct mechanism. You already understand the first from omitted variable bias: if a variable Z affects Y and is correlated with X but is left out of the model, Z ends up in the error term, making u correlated with X. The classic example is estimating the returns to education: ability affects wages and is correlated with education, so omitting ability inflates the estimated education coefficient. The direction of bias follows a simple formula: the product of the sign of (Z's effect on Y) and the sign of (Z's correlation with X). Second, simultaneity arises when X and Y jointly determine each other. Trying to estimate a demand curve using market data is the canonical case: both price and quantity are simultaneously set by the intersection of supply and demand, so price is correlated with the demand error. Running OLS on price and quantity gives neither a supply curve nor a demand curve — it gives a jumbled blend of both.

The third source, measurement error in X, is subtler but important. Suppose you're estimating the effect of true ability (X*) on wages, but you only observe test scores (X = X* + v) where v is random noise. The noise v ends up in the error term and is negatively correlated with the mismeasured X, because X absorbs part of v while the remaining v creates negative covariance with u. The result is attenuation bias: your estimated coefficient is biased toward zero — you understate the true relationship. The magnitude of attenuation equals the reliability ratio, the fraction of X's variance that is true signal rather than noise.

All the major tools of applied econometrics — instrumental variables, difference-in-differences, regression discontinuity, panel fixed effects — exist specifically to address one or more forms of endogeneity. Instrumental variables finds a variable Z that shifts X but has no direct effect on Y and no correlation with the error, allowing you to use only the exogenous variation in X for identification. Panel fixed effects remove time-invariant omitted variables by differencing out each unit's average. Understanding endogeneity is therefore not an isolated topic — it is the central diagnostic question behind every causal regression design. Before trusting any OLS estimate, ask: is there any reason my regressor might be correlated with the error? If yes, identify the source and select the appropriate remedy.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionCentral Limit TheoremConfidence Intervals for MeansZ-Tests and T-Tests for MeansOne-Sample Z-Test for MeansOne-Sample and Two-Sample T-TestsOne-Way ANOVAF-Test and Joint SignificanceR-Squared and Model FitOmitted Variable BiasEndogeneity

Longest path: 84 steps · 422 total prerequisite topics

Prerequisites (2)

Leads To (4)