A researcher runs OLS regression and detects heteroskedasticity using a Breusch-Pagan test. What is the most accurate conclusion about their results?
AThe coefficient estimates β̂ are biased and must be re-estimated using WLS or GLS
BThe coefficient estimates β̂ are still unbiased, but the standard errors are incorrect and inference is invalid
CBoth the coefficient estimates and standard errors are unreliable and the regression must be discarded
DThe regression is fine as long as the sample size is large enough for asymptotic normality to hold
Heteroskedasticity does not bias β̂. Bias comes from E[β̂] − β, which depends on whether E[u|x] = 0 — a condition unrelated to variance. What heteroskedasticity destroys is efficiency (OLS is no longer BLUE) and the validity of the standard error formula (which assumes constant σ²). The practical remedy is heteroskedasticity-robust standard errors, which correct the inference without changing β̂ at all.
Question 2 Multiple Choice
A regression of food expenditure on income yields β̂_income = 0.35. The residual plot shows a clear fan shape (wider spread at higher incomes). A colleague argues the estimate of 0.35 is biased by the heteroskedasticity. What is wrong with this claim?
ANothing — a fan-shaped residual plot is evidence of both heteroskedasticity and omitted variable bias
BThe colleague is right that 0.35 is biased, but only if the fan shape is statistically significant by a formal test
CHeteroskedasticity affects standard errors and inference, not the point estimate β̂ — 0.35 remains an unbiased estimate
DThe claim would be correct only if the heteroskedasticity were correlated with the regressor (income)
Unbiasedness of OLS depends on E[u|x] = 0 (mean independence of errors from regressors), which is entirely separate from the variance condition Var(u|x) = σ². The fan shape indicates that error variance increases with income — a violation of homoskedasticity — but this says nothing about whether errors have zero mean given income. The coefficient estimate of 0.35 remains unbiased. What the fan shape invalidates is the standard error (and therefore the t-statistic and p-value) attached to that estimate.
Question 3 True / False
Under heteroskedasticity, OLS coefficient estimates remain unbiased, but the reported standard errors are typically too small, causing t-statistics to be inflated and p-values to be too low.
TTrue
FFalse
Answer: True
This is the precise consequence of heteroskedasticity. The standard OLS variance formula Var(β̂) = σ²(X'X)⁻¹ assumes constant σ² across all observations. When variance is not constant, this formula underestimates the true sampling variance of β̂, producing standard errors that are too small. Smaller standard errors mean larger t-statistics and smaller p-values — so you find apparent statistical significance that isn't really there. Robust standard errors correct this by estimating the true sampling variance directly from the residuals.
Question 4 True / False
Heteroskedasticity causes OLS to produce biased estimates of the regression coefficients.
TTrue
FFalse
Answer: False
This is the most common misconception about heteroskedasticity. Bias in OLS (E[β̂] ≠ β) results from violations of the zero-mean error condition (E[u|x] ≠ 0), such as omitted variable bias or endogeneity — not from non-constant variance. Heteroskedasticity only violates the equal-variance assumption, leaving the zero-mean condition intact. β̂ remains an unbiased estimator of β; what fails is the efficiency guarantee (OLS is no longer BLUE) and the validity of standard error formulas.
Question 5 Short Answer
Why does heteroskedasticity make OLS standard errors invalid even though it does not bias the coefficient estimates? What specifically breaks down in the standard error formula?
Think about your answer, then reveal below.
Model answer: The standard OLS standard error formula is derived assuming Var(u|x) = σ² — the same constant variance for every observation. Under heteroskedasticity, this assumption is violated: some observations have larger error variance than others. The formula plugs in a single estimated σ² as if it applied everywhere, but the true sampling variance of β̂ depends on the pattern of error variances across observations. The result is a mismatch: the formula produces a number, but that number is not the actual variance of the estimator. It typically underestimates the true variance, making confidence intervals too narrow and p-values too small.
Robust standard errors solve this by directly estimating the true sampling variance of β̂ from the squared residuals, without assuming any particular structure for how variance varies. They are consistent estimators of Var(β̂) under general forms of heteroskedasticity. The coefficient β̂ itself is unchanged — only the uncertainty quantification around it is corrected.