A researcher estimates a regression of household consumption on income and finds evidence of heteroskedasticity. What is the most important consequence for their results?
AThe coefficient estimates β̂ are now biased and no longer point at the true population parameters
BThe coefficient estimates β̂ remain unbiased, but the standard errors are wrong, making t-statistics and confidence intervals unreliable
CThe R² statistic becomes meaningless under heteroskedasticity
DOLS will fail to converge and produce no estimates at all
Heteroskedasticity does NOT bias OLS coefficients — β̂ remains unbiased and consistent. What breaks is the variance formula for β̂. The standard OLS formula for Var(β̂) assumes homoskedasticity; when that assumption fails, the formula gives wrong standard errors. Typically these are underestimated, making t-statistics too large and results appear more statistically significant than they really are. The distinction between biased estimates and wrong inference is the central practical lesson of this topic.
Question 2 Multiple Choice
A regression of firm profits on revenue shows residuals that are small for small firms but very large for large firms. What is the most likely explanation?
AThe regression model is misspecified and should include a quadratic revenue term
BHeteroskedasticity driven by scale: variance in profit grows with firm size because large firms have more discretion in how they allocate revenue
CThe large firms are outliers that should be removed before estimation
DThe error variance is constant — large residuals for large firms simply reflect larger absolute values, not different variance
This is the classic scale-driven heteroskedasticity pattern. Large firms generate larger absolute errors because discretion in allocating revenue grows with firm size — two firms with the same large revenue might have very different profits. The variance of the residual genuinely differs across the range of X. Option D confuses levels with variance: large firms having larger absolute residuals is exactly what heteroskedasticity looks like. Option A is possible but not the most likely explanation given a systematic fan-out pattern.
Question 3 True / False
In a regression with heteroskedasticity, OLS coefficient estimates are biased toward zero.
TTrue
FFalse
Answer: False
Heteroskedasticity does not bias OLS coefficient estimates. Unbiasedness only requires that errors have zero conditional mean — E(u|X) = 0 — which is a separate assumption from homoskedasticity. Heteroskedasticity violates the 'Best' part of Gauss-Markov (OLS is no longer the minimum-variance unbiased estimator) and breaks the standard error formula, but the estimates themselves remain unbiased and consistent. Bias would require the errors to be systematically correlated with X — that is endogeneity, a different violation.
Question 4 True / False
Heteroskedasticity typically causes OLS standard errors to be underestimated, making t-statistics appear larger than they should be.
TTrue
FFalse
Answer: True
When residual variance grows with X (the common fan-out pattern), the true variance of β̂ is larger than what the standard OLS formula reports. OLS assumes constant variance — it effectively averages variance across all observations. When high-X observations have much more variance than accounted for, the formula underestimates uncertainty. The result is standard errors that are too small, t-statistics that are too large, and p-values that are too small — spuriously significant results. This is why using robust standard errors is standard practice.
Question 5 Short Answer
Why does heteroskedasticity break statistical inference (standard errors, t-tests) without biasing the OLS coefficient estimates themselves?
Think about your answer, then reveal below.
Model answer: OLS minimizes the sum of squared residuals, which finds the best linear predictor of Y given X regardless of whether error variance is constant. Unbiasedness requires only that errors average to zero conditional on X — a property preserved under heteroskedasticity. But the formula for the standard error of β̂ is derived under the assumption that all errors have the same variance σ². When variance differs across X values, this formula gives the wrong answer. The coefficient converges to the right value; it's the uncertainty measure around that coefficient that is miscalculated.
Think of it this way: the coefficient captures the average relationship between X and Y, and OLS finds that average correctly. But standard errors measure how much you should trust your coefficient estimate — and that depends on how noisy each data point is. Heteroskedasticity means some data points are much noisier than others. The standard formula treats all observations as equally informative, which misrepresents actual uncertainty. Robust standard errors correct this by accounting for the actual pattern of residual variance.