A dataset contains household income surveys where wealthier households have far more variable income reports. You run OLS and find heteroskedasticity. What does WLS do differently from using robust standard errors?
AWLS corrects the coefficient estimates; robust SEs correct only the standard errors
BWLS re-weights observations to restore efficiency, producing BLUE estimates; robust SEs correct standard errors without changing the estimates or their efficiency
CWLS removes high-variance observations; robust SEs keep them but downweight their influence
DWLS and robust standard errors are equivalent approaches that produce identical results
Both approaches address heteroskedasticity, but they work differently. Robust standard errors leave the OLS coefficient estimates unchanged and correct only the standard errors for inference. WLS re-weights the data — giving low weight to high-variance observations — to produce a new estimator that is BLUE (Best Linear Unbiased Estimator) when the variance structure is correctly specified. WLS estimates are more efficient than OLS under heteroskedasticity; robust SEs make OLS inference valid without improving efficiency. The tradeoff: WLS is better when the variance model is correct; robust SEs are safer when it's not.
Question 2 Multiple Choice
In feasible WLS, you estimate weights from the data rather than knowing the true variance function. What is the main risk of this two-stage procedure?
AThe coefficient estimates become biased because estimated weights introduce endogeneity
BThe efficiency gain disappears entirely if the variance model is misspecified
CEstimated weights introduce additional uncertainty that can distort standard errors in finite samples, and misspecification of the variance model can reduce efficiency below OLS
DFeasible WLS always produces larger standard errors than OLS, making it conservative
Feasible WLS uses the data twice — once to estimate the variance function, once to run WLS — which introduces additional uncertainty. In large samples this usually doesn't matter much, but in finite samples the estimated weights add noise. More importantly, if the variance model is misspecified (e.g., variance is modeled as a linear function of X when it's actually quadratic), feasible WLS can be less efficient than OLS, not more. This is why verifying that WLS residuals look more homoskedastic than OLS residuals is an important diagnostic step.
Question 3 True / False
WLS assigns higher weight to observations with high variance because they contain more information about the true relationship.
TTrue
FFalse
Answer: False
This is exactly backwards. WLS assigns LOWER weight (w_i = 1/σ²_i) to high-variance observations, because high variance means an observation contains less precise information about the true relationship. A noisily-measured data point should pull the regression line less than a precisely-measured one. Giving high-variance observations large weight — as plain OLS effectively does by treating all observations equally — allows a few noisy points to disproportionately distort the estimated coefficients.
Question 4 True / False
WLS is a special case of Generalized Least Squares (GLS) applicable when errors are heteroskedastic but uncorrelated across observations.
TTrue
FFalse
Answer: True
GLS handles the general case where the error covariance matrix Ω is any positive definite matrix. WLS is the special case where Ω is diagonal — errors are uncorrelated across observations but have different variances on the diagonal. The GLS transformation multiplies by Ω^{-1/2}; for the diagonal WLS case, this is simply dividing each observation i by its standard deviation σ_i, which is equivalent to multiplying by the square root of the weight. After this transformation, the rescaled errors are homoskedastic and OLS on the transformed data is efficient.
Question 5 Short Answer
Explain intuitively why WLS assigns higher weight to low-variance observations, and what problem this solves.
Think about your answer, then reveal below.
Model answer: Low-variance observations are precisely measured — they tell us a lot about the true relationship between X and Y. High-variance observations are noisy — they tell us less. OLS treats all observations equally, so a handful of imprecise, noisy points can pull the fitted line away from the true relationship. WLS corrects this by giving each observation influence proportional to its precision (inverse variance). The result is a fitted line that is more tightly governed by informative data, achieving the minimum variance among all linear unbiased estimators — BLUE — when the variance structure is correctly specified.
This is the core intuition behind WLS: weight by precision, not by count. The analogy is measuring a table with a ruler versus with a tape measure — if you have 10 ruler measurements and 1 tape measure measurement, you should trust the average of the ruler measurements more, but not ignore the tape measure reading. WLS formalizes this intuition into a regression framework.