A researcher uses quarter of birth as an instrument for years of schooling in a wage regression. The first-stage F-statistic is 3.2. What is the core problem with proceeding to 2SLS estimation?
AA weak instrument always violates the exclusion restriction, making 2SLS inconsistent
BIn finite samples, weak instruments cause 2SLS to inherit OLS's endogeneity bias, defeating the purpose of IV
CThe F-statistic must exceed 100 for valid IV estimation; 3.2 is marginally too low
DWeak instruments only bias 2SLS in small samples; with large samples the problem disappears
Weak instruments (the rule of thumb is F > 10 in the first stage) cause 2SLS to be nearly as biased as OLS, even in large samples. The whole point of IV is to use a 'clean' source of variation in X that is uncorrelated with the error. If the instrument barely moves X, the first stage adds little information, and the second stage estimate is dominated by the same endogeneity contaminating OLS. The bias does not disappear asymptotically with truly weak instruments — it remains a finite-sample problem that scale cannot fix.
Question 2 Multiple Choice
A researcher has two instruments for one endogenous variable and runs the Sargan-Hansen overidentification test, which they pass. What does passing this test establish?
ABoth instruments are exogenous — the test directly confirms the exclusion restriction holds for each
BThe instruments are mutually consistent, but this does not confirm that any single instrument is truly exogenous
CThe instruments are relevant — a passing overidentification test implies strong first-stage F-statistics
DThe 2SLS estimates are unbiased in finite samples for this specification
The Sargan-Hansen test checks whether multiple instruments give mutually consistent estimates. If two instruments disagree about the coefficient on X, at least one must be invalid. But if they agree, it only means they tell the same story — not that either story is correct. If both instruments share the same violation of the exclusion restriction (both affect Y through the same back door), they can be mutually consistent and both wrong. Passing the test is weak, indirect evidence for exogeneity, not a confirmation of it.
Question 3 True / False
The relevance condition for an instrumental variable can be tested empirically using the first-stage regression, but the exogeneity condition (exclusion restriction) cannot be directly tested when there is only one instrument.
TTrue
FFalse
Answer: True
True. Relevance — Cov(Z, X) ≠ 0 — is directly testable: regress X on Z and check the F-statistic. The exogeneity condition — that Z affects Y only through X — requires knowing the structural error u, which is unobservable. You can't regress Z on u to test correlation between them. This is why the credibility of an IV study depends heavily on institutional knowledge and theoretical reasoning: you must argue from first principles that there is no plausible back-door path from Z to Y other than through X.
Question 4 True / False
An instrument with a strong first-stage relationship (high F-statistic) guarantees that the 2SLS estimator is consistent, regardless of whether the exclusion restriction holds.
TTrue
FFalse
Answer: False
False. Relevance and exogeneity are both necessary conditions for IV validity; neither alone is sufficient. A strong but endogenous instrument — one that correlates with the error term u through some direct effect on Y — will produce an inconsistent 2SLS estimate that does not converge to the true causal effect even in infinite samples. In fact, a strongly endogenous instrument can produce estimates that are worse than OLS because it amplifies the bias from the exclusion restriction violation. Strength (relevance) makes the estimator precise; cleanliness (exogeneity) makes it accurate.
Question 5 Short Answer
Explain why the exclusion restriction cannot be directly tested, and what kinds of evidence researchers typically use to argue that an instrument satisfies it.
Think about your answer, then reveal below.
Model answer: The exclusion restriction requires that Z has no direct effect on Y except through X — formally, E[Zu] = 0 where u is the structural error term. Since u is unobservable (it contains all unmeasured determinants of Y), you cannot directly estimate Cov(Z, u). Researchers argue for the exclusion restriction using: (1) institutional knowledge — a detailed story about the mechanism by which Z affects X, with no plausible back-door to Y; (2) falsification tests using outcomes that should not be affected by Z if the restriction holds; (3) comparing IV estimates across subsamples where the back-door pathway should differ in magnitude; (4) overidentification tests when multiple instruments are available (though these only test mutual consistency). The exclusion restriction is ultimately a theoretical, not a statistical, claim.
This is why IV credibility is heavily linked to the quality of the 'story' behind the instrument. The best IV studies use quasi-experimental variation — natural experiments, policy discontinuities, or lottery assignments — where the exclusion restriction is credible by design rather than just by assumption.