A wage regression on years of education and a cognitive test score produces: R² = 0.85, F-statistic p < 0.001, but neither coefficient has a significant t-statistic (both p > 0.3). VIFs for both variables are 12. What is the most likely explanation?
AThe model is misspecified — both variables are irrelevant and should be dropped
BMulticollinearity is inflating standard errors, making individual coefficients imprecise even though the variables jointly explain wages well
COLS estimates are biased because education and test score are correlated
DThe sample size is too small for regression to produce reliable results
The pattern — significant F-statistic, high R², but individually insignificant t-statistics — is the diagnostic fingerprint of multicollinearity. The F-test asks whether the regressors jointly explain variation (yes, they do), while t-tests ask whether each can be separately identified (they cannot, because the correlated variables move together). VIFs of 12 confirm it: each variable's standard error is √12 ≈ 3.5 times larger than it would be in an orthogonal design. Crucially, OLS estimates are still unbiased — the problem is precision, not direction.
Question 2 Multiple Choice
A researcher notices severe multicollinearity between two regressors and drops one of them to reduce the standard errors. What is the most important risk of this approach?
AThe remaining variable's coefficient will have higher variance without the dropped variable's stabilizing influence
BIf the dropped variable truly belongs in the model, omitting it introduces omitted variable bias — the remaining coefficient absorbs part of the dropped variable's effect
COLS standard errors will increase further because the model now has fewer regressors
DThe model's R² will fall below the threshold needed for the results to be publishable
Dropping a correlated variable does reduce standard errors — but at the cost of bias if the dropped variable actually affects the outcome. The surviving coefficient now picks up the effect of the omitted variable (to the extent they are correlated), making it a biased estimate of the true partial effect. Multicollinearity inflates standard errors without biasing estimates; omitted variable bias biases estimates without necessarily inflating standard errors. Trading precision for bias is often the worse outcome.
Question 3 True / False
Multicollinearity violates the Gauss-Markov assumptions, causing OLS coefficient estimates to become biased and inconsistent.
TTrue
FFalse
Answer: False
This is a common and important misconception. Multicollinearity does NOT violate any Gauss-Markov assumption — the OLS estimator remains BLUE (Best Linear Unbiased Estimator) even under severe multicollinearity. What changes is the precision of estimates: standard errors inflate, confidence intervals widen, and t-statistics shrink. The coefficient estimates themselves remain unbiased — they are just imprecisely estimated. Only *perfect* multicollinearity (an exact linear combination) makes OLS undefined by making (X'X) singular.
Question 4 True / False
A high Variance Inflation Factor (VIF) for a regressor indicates that much of that variable's variation is explained by the other regressors, leaving little independent variation for OLS to use in identifying its effect.
TTrue
FFalse
Answer: True
VIF_j = 1 / (1 − R²_j), where R²_j is the R-squared from regressing variable j on all other regressors. A high R²_j means the other variables almost fully predict j — j has little variation that is 'uniquely its own.' OLS needs independent variation in a regressor to estimate its partial effect; when that variation is thin, the coefficient estimate is based on few effective comparisons and is therefore imprecise. This is why VIF directly measures the precision loss from multicollinearity.
Question 5 Short Answer
Explain why multicollinearity inflates standard errors but does not bias OLS coefficient estimates. What specific information is the data 'lacking' that causes the precision problem?
Think about your answer, then reveal below.
Model answer: Bias requires that the estimator systematically over- or under-estimates the true coefficient on average; multicollinearity does not cause this because no Gauss-Markov assumption is violated. The precision problem arises because OLS must find observations where one regressor varies while the other holds roughly constant — comparisons that are rare when variables are highly correlated. With little independent variation to work with, OLS produces wide confidence intervals. The estimates are still centered on the truth (unbiased), but they are noisily estimated.
Think of it this way: to estimate the effect of education holding test score fixed, you need observations where education differs but test scores are similar. If education and test score always move together in your data, such observations are scarce — you have thin 'identifying variation.' OLS uses all available data and still produces unbiased estimates in expectation, but the sampling variance around those estimates is high. The data isn't wrong; it just doesn't contain enough of the right comparisons to answer the fine-grained question the model is asking.