Questions: Regression Diagnostics: Checking Assumptions and Violations
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A residual plot shows a clear fan shape — residuals spread out as fitted values increase. The most important consequence of ignoring this is:
AThe coefficient estimates will be biased toward zero
BThe model will underfit the data and miss real patterns
CThe standard errors, p-values, and confidence intervals will be unreliable, even though the coefficient estimates themselves may still be correct
DThe R² will be inflated, making the model appear stronger than it is
A fan shape signals heteroscedasticity — non-constant variance across fitted values. Under heteroscedasticity, OLS coefficient estimates remain unbiased (they still identify the conditional mean correctly), but the standard errors are wrong. This invalidates all inference: p-values may indicate significance where none exists, confidence intervals will be too narrow or wide, and hypothesis tests will have incorrect Type I error rates. The remedy is robust (sandwich) standard errors — not refitting the model — which is why recognizing that coefficients and standard errors are separable consequences is the key insight.
Question 2 Multiple Choice
A researcher finds a variance inflation factor (VIF) of 15 for one predictor. The most appropriate interpretation is:
AThe predictor must be dropped immediately — VIF above 10 invalidates any model
BRobust standard errors should be applied to address the inflated variance
CThis predictor's coefficient estimate is unstable due to high correlation with other predictors; standard errors are inflated and the model specification should be reconsidered
DVIF measures collinearity but has no effect on OLS coefficient estimates or standard errors
High VIF means the predictor is highly collinear with other predictors, making it difficult for the model to disentangle their individual effects. This inflates standard errors (making coefficient estimates unstable — small data changes produce large estimate changes) and reduces interpretability. Option B is wrong because robust SE addresses heteroscedasticity, not multicollinearity. Option D is wrong because VIF does affect standard errors. Option A overstates the case — the appropriate response depends on whether prediction or interpretation is the goal.
Question 3 True / False
If OLS regression assumptions are violated, the coefficient estimates are typically biased.
TTrue
FFalse
Answer: False
This overstates the consequences. Heteroscedasticity and autocorrelation, for example, leave coefficient estimates unbiased — OLS still correctly estimates the conditional mean — but they invalidate standard errors and inference. Only specific violations produce biased estimates: omitted variable bias (relevant predictor excluded), endogeneity (predictor correlated with the error), or measurement error in predictors. Knowing *which* assumption is violated and *what its specific consequences are* is exactly the point of regression diagnostics — a single blanket response ('my estimates are biased') is often incorrect.
Question 4 True / False
A Q-Q plot of regression residuals is used to assess the normality assumption: points that deviate from the 45-degree reference line indicate non-normality.
TTrue
FFalse
Answer: True
A Q-Q plot plots the quantiles of the observed residual distribution against the quantiles expected from a theoretical normal distribution. When residuals are normally distributed, the points fall approximately on the 45-degree line. Systematic deviations — S-curves (skewness), heavy tails, or gaps — indicate non-normality. For large samples, normality violations matter less (by the central limit theorem), but in small samples, non-normal residuals can distort p-values and confidence intervals substantially.
Question 5 Short Answer
Why is it important to identify *which* OLS assumption has been violated before choosing a remedy, rather than applying a single catch-all fix?
Think about your answer, then reveal below.
Model answer: Different assumption violations have different consequences and require different remedies. Heteroscedasticity leaves coefficients unbiased but invalidates standard errors — the remedy is robust (sandwich) standard errors, not model re-specification. Nonlinearity biases coefficient estimates — the remedy is transformations, polynomial terms, or interaction effects. Multicollinearity inflates standard errors without biasing estimates — robust SE doesn't help; reconsidering which variables to include might. Clustered observations require cluster-robust SE or multilevel models. Applying the wrong remedy can be harmful: using robust SE when the real problem is omitted variable bias leaves the bias in place while giving false confidence that inference is valid. The diagnostic step tells you what is wrong; understanding why guides the appropriate substantive fix.
Regression diagnostics connect technical checks to substantive modeling decisions. Violations often signal that the model is misspecified — missing a nonlinear relationship, ignoring clustering structure, or including highly redundant predictors. The remedy therefore depends on understanding the research context, not just applying a statistical correction formula.