Questions: Weak Instruments: Diagnosis and Solutions
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher has an endogenous regressor and two instruments. The first-stage F-statistic is 4.2. What is the most appropriate response?
AProceed with 2SLS — two instruments are stronger than one
BSwitch to OLS, since weak instruments make IV invalid anyway
CReport 2SLS results with a warning about weak instruments
DUse LIML or Anderson-Rubin inference, which remain valid under weak instruments
F = 4.2 is well below the Stock-Yogo rule of thumb (F > 10 for approximately 10% relative bias). Standard 2SLS is problematic because the estimator is pulled toward OLS bias. Adding more instruments (option A) actually worsens first-stage overfitting. Switching to OLS (option B) abandons endogeneity correction entirely. LIML is median-unbiased under weak instruments and outperforms 2SLS in most simulations; Anderson-Rubin confidence sets remain valid by inverting a test rather than relying on the first stage.
Question 2 Multiple Choice
Why does adding more weak instruments to a 2SLS regression typically worsen bias rather than helping?
AMore instruments reduce the degrees of freedom available for the second stage
BMore excluded instruments overfit the first stage, increasing the share of noise in the instrumented regressor
CThe exclusion restriction becomes harder to satisfy with more instruments
DMore instruments raise the Cragg-Donald F-statistic above the Stock-Yogo threshold, falsely clearing the diagnostic
The first stage regresses the endogenous variable on the instruments and controls. With many weak instruments, the first stage is highly overfit: predicted values are driven by noise rather than genuine signal. The second stage then uses this noisy, overfitted prediction — a regressor that is mostly measurement error — which biases 2SLS toward OLS. The Cragg-Donald F-statistic actually tends to fall with many weak instruments, flagging the problem rather than hiding it.
Question 3 True / False
A high first-stage F-statistic (e.g., F > 50) guarantees that 2SLS estimates are unbiased.
TTrue
FFalse
Answer: False
A high F-statistic indicates a strong instrument (high relevance), but relevance is only one of two IV requirements. The exclusion restriction — that the instrument affects the outcome only through the endogenous regressor — is not tested by the F-statistic and cannot be tested from the data alone when the model is exactly identified. A strong instrument that violates the exclusion restriction produces precise but biased estimates. The F-statistic addresses weak-instruments bias specifically, not exclusion restriction violations.
Question 4 True / False
Under weak instruments, 2SLS estimates tend to be biased toward the OLS estimate rather than toward zero.
TTrue
FFalse
Answer: True
Correct. The purpose of 2SLS is to isolate exogenous variation and produce consistent estimates by correcting OLS bias. With weak instruments, the first stage captures little genuine exogenous variation; most variation in the instrumented regressor comes from the endogenous component. The second stage then behaves much like OLS on the original endogenous regressor, inheriting its bias. The direction is toward OLS bias, not zero — which is insidious, since the estimates look plausible but remain contaminated by endogeneity.
Question 5 Short Answer
Why does the first-stage F-statistic, rather than the first-stage R² or individual t-statistics, serve as the standard diagnostic for instrument weakness?
Think about your answer, then reveal below.
Model answer: With multiple instruments, a researcher could have several instruments each with modest individual t-statistics that jointly provide strong first-stage prediction — or many instruments with seemingly adequate individual t-statistics that together overfit the first stage. The F-statistic tests the joint significance of all excluded instruments, capturing overall explanatory power. Crucially, Stock-Yogo (2005) derived critical values directly from the F-statistic's relationship to 2SLS relative bias, making it the natural scale for the question 'how much does weakness contaminate my estimate?' R² measures variance explained but doesn't translate to bias properties.
The formal result is that for a given F-value, you can bound the maximum bias of 2SLS relative to OLS bias. F > 10 corresponds approximately to 2SLS bias being no more than 10% of OLS bias for a single instrument. This directly answers the researcher's question. The Kleibergen-Paap statistic generalizes the F-statistic to heteroskedastic/clustered settings while maintaining the same interpretive framework.