Questions: Regression Diagnostics and Residual Analysis
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
You fit a linear regression and examine the residuals vs. fitted plot. The residuals form a fan shape — small near low fitted values and large near high fitted values. What assumption is violated, and what is the appropriate response?
ALinearity is violated; add a polynomial term to the model
BHomoscedasticity is violated; use robust standard errors or transform the response
CIndependence is violated; use a time-series model
DNormality is violated; use a non-parametric regression
A fan or funnel shape in the residuals vs. fitted plot is the signature of heteroscedasticity — the error variance grows with the fitted value rather than remaining constant. Remedies include robust standard errors, a variance-stabilizing transformation (e.g., log of the response), or weighted least squares. A curved pattern — not a fan — would indicate nonlinearity. Each plot pattern corresponds to a specific violated assumption.
Question 2 Multiple Choice
An observation has predictor values far from the center of the data (high leverage) but a residual very close to zero. What is its likely effect on the regression?
AIt is highly influential and will distort the coefficient estimates
BIt will inflate standard errors for all coefficients
CIt will likely have minimal distorting influence despite its unusual predictor position
DIt should be removed because high-leverage points are always problematic
High leverage measures how unusual an observation's predictor values are — it captures potential for influence. But actual influence requires both high leverage AND a large residual. An observation that fits the model well (small residual) exerts little pull on the fitted line even from an extreme position. Cook's distance formalizes this: it combines leverage and residual size into a single influence measure. A high-leverage, low-residual point can actually stabilize the fit by confirming the trend at the extremes.
Question 3 True / False
A Q-Q plot showing heavy-tailed deviations from the reference line indicates that OLS coefficient estimates are biased.
TTrue
FFalse
Answer: False
Heavy tails in the residuals violate the normality assumption, but OLS estimates remain unbiased — they are unbiased under any error distribution as long as the Gauss-Markov conditions hold. What suffers is inference: t-statistics and F-statistics rely on normality, so p-values and confidence intervals become unreliable with severe non-normality, especially in small samples. OLS is robust to mild normality departures in large samples by the Central Limit Theorem.
Question 4 True / False
A curved (bent) pattern in the residuals vs. fitted plot, rather than a random horizontal band, is evidence that the linearity assumption may be violated.
TTrue
FFalse
Answer: True
The residuals vs. fitted plot should show a structureless horizontal band centered at zero if the model is correctly specified. A systematic curve indicates that the mean of the residuals is not zero across the range of fitted values — the linear model is systematically over- or under-predicting in different regions. This is the visual signature of a nonlinear true relationship. Adding a polynomial term or transforming a predictor are the typical remedies.
Question 5 Short Answer
Why do regression diagnostics examine residuals (yᵢ − ŷᵢ) rather than the true errors εᵢ?
Think about your answer, then reveal below.
Model answer: The true errors εᵢ = yᵢ − Xᵢβ are unobservable because β (the true population coefficients) is unknown. Residuals substitute the estimated β̂ for the unknown β, making the errors observable. If the model assumptions hold, residuals approximate the true errors and should exhibit no systematic patterns.
Residuals are imperfect proxies — they are not independent (they sum to zero by construction) and are slightly compressed toward zero by the estimation process. This is why diagnostics sometimes use standardized or studentized residuals, which correct for the fact that each residual has different variance depending on its leverage. Despite these imperfections, residual plots are remarkably informative because patterns in residuals reflect patterns in the true errors.