A researcher regresses student test scores on class size, finding smaller classes improve scores significantly. A colleague argues this is biased because wealthier school districts tend to have both smaller classes and higher-scoring students. The colleague is describing:
ASampling error — the sample is not large enough to detect the true class-size effect
BReverse causality — higher test scores cause districts to reduce class sizes
COmitted variable bias — family wealth affects test scores and correlates with class size, biasing the coefficient
DMulticollinearity — class size and wealth are too correlated to separate their effects
Family wealth (or socioeconomic status) affects test scores (positive effect on Y) and is positively correlated with smaller class sizes (wealthier districts have more resources per student). This fits the OVB formula exactly: a variable affecting Y that correlates with the regressor of interest, left out of the model. The class-size coefficient absorbs part of the wealth effect, biasing it. Multicollinearity is a different problem affecting standard errors, not coefficient direction.
Question 2 Multiple Choice
A study omits ability from a wage regression on years of schooling. Ability has a positive effect on wages and is positively correlated with schooling. According to the OVB formula, the schooling coefficient is biased in which direction?
ADownward — ability's positive influence inflates the education coefficient negatively
BUpward — the coefficient is too large because it also captures ability's positive effect on wages
CUpward — because ability negatively affects wages and negatively correlates with schooling
DNot biased — correlation between ability and schooling doesn't affect the schooling coefficient
OVB formula: bias = (effect of omitted on Y) × (correlation of omitted with X). Ability has a positive effect on wages (+) and is positively correlated with schooling (+). Positive × positive = upward bias. The schooling coefficient is too large — it absorbs part of ability's positive wage effect. This is the canonical example from the explainer. Option D is a common misconception: correlation with X is precisely what causes bias, not just an effect on Y.
Question 3 True / False
Omitted variable bias cannot be eliminated by collecting a larger sample of the same kind of data — it requires either measuring the omitted variable or using an alternative identification strategy.
TTrue
FFalse
Answer: True
OVB is a structural problem, not a sampling problem. The OLS estimator in a misspecified model converges to the wrong quantity as sample size grows — it is homing in on a biased value, not fluctuating around the right one. The explainer states: 'A million observations... all omitting ability, will give you a million-observation estimate of the same biased number.' More data sharpens the wrong estimate. Fixes require changing the information set: include the omitted variable, use an instrumental variable, or exploit a suitable research design.
Question 4 True / False
If an omitted variable affects the outcome Y but is uncorrelated with the included regressor X, omitting it biases the OLS coefficient on X.
TTrue
FFalse
Answer: False
OVB requires both conditions: the omitted variable must affect Y AND be correlated with X. The Common Misconceptions section states: 'If the omitted variable is uncorrelated with the regressor of interest, omitting it biases the standard errors but not the coefficient.' With zero correlation to X, the omitted variable's effect on Y goes entirely into the error term, increasing its variance (and biasing standard errors), but the coefficient on X remains unbiased.
Question 5 Short Answer
A friend argues that a very large, carefully designed survey can fix any regression problem, including omitted variable bias. How would you explain why this claim is wrong?
Think about your answer, then reveal below.
Model answer: OVB is an identification problem, not a sampling problem. The estimator is converging to the wrong value as sample size grows — more observations just sharpen the wrong estimate. No amount of data can correct for a variable that belongs in the model but isn't there. Fixing OVB requires changing what information enters the model: measuring and including the omitted variable, using an instrumental variable that isolates variation in X uncorrelated with the omission, or exploiting a research design that makes the omission irrelevant.
This is why OVB is described as the fundamental obstacle to causal inference with observational data. Sampling problems (imprecision, noise) improve with more data. Identification problems (converging to the wrong value) do not — they require a different strategy entirely. Data quantity is orthogonal to data quality in this sense.