A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Regression Diagnostics and Residual Analysis

College Depth 84 in the knowledge graph ☐ I know this ☆ Set as goal

198topics build on this

330prerequisites beneath it

Assumptions in Linear Regression Least Squares Estimation→→Graphical Diagnostics: Residual Plots and QQ Plots Inference in Linear Regression

Core Idea

Residuals (yᵢ - ŷᵢ) show departures from the fitted model. Diagnostic plots include: residuals vs. fitted (linearity, homoscedasticity), Q-Q plots (normality), scale-location (variance), and residuals vs. leverage (influential points). Influential observations and outliers require investigation.

Explainer

You've learned the OLS assumptions: linearity, independence, homoscedasticity (constant error variance), normally distributed errors, and no perfect multicollinearity. But fitting a regression model doesn't verify those assumptions — it merely produces estimates regardless of whether the assumptions hold. Regression diagnostics are the tools for checking whether your data actually satisfies what the model requires. The core insight is that the residuals (yᵢ − ŷᵢ) are observable proxies for the unobservable true errors εᵢ. If the model is correctly specified, residuals should look like a random, structureless cloud. Any pattern you find is evidence of a violated assumption.

The residuals vs. fitted plot is the first thing to examine. Plot the residuals on the y-axis against the fitted values ŷᵢ on the x-axis. Under correct specification, you should see a horizontal band centered at zero with uniform spread. Two warning signs: a curved or bent pattern (suggesting the linearity assumption is violated — your relationship is nonlinear and you need a transformed predictor or polynomial term) and a fan or funnel shape (suggesting heteroscedasticity — variance grows or shrinks with the fitted value). The scale-location plot reinforces the heteroscedasticity check by plotting √|standardized residuals| vs. fitted values; a horizontal trend line is what you want.

The Q-Q (quantile-quantile) plot tests normality of the residuals. The standardized residuals are ranked and plotted against the theoretical quantiles of a standard normal distribution. If the residuals are normally distributed, the points fall along the 45-degree reference line. Heavy tails show up as S-curves at the extremes; skewness appears as a systematic bow. Perfect normality is rarely achieved, and OLS inference is reasonably robust to mild departures, but severe non-normality — especially in small samples — undermines hypothesis tests and confidence intervals.

The residuals vs. leverage plot is conceptually different from the others. Leverage measures how far an observation's predictor values are from the center of the predictor space — a high-leverage point has unusual X values and can exert disproportionate influence on the fitted line. But high leverage alone is not a problem; it only becomes problematic when paired with a large residual. Cook's distance combines both into a single influence measure: it quantifies how much the estimated coefficients would change if the observation were deleted. Points appearing in the upper-right of the leverage plot (high leverage, high residual) are influential and deserve individual investigation — are they data entry errors, genuinely unusual cases, or observations from a different population? Each of these has a different remediation, so checking the data before automatically removing points is essential.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Vectors in Two Dimensions → Vector Operations: Addition, Subtraction, and Scalar Multiplication → Dot Product (Inner Product in R^n) → Inner Product Spaces → Orthogonality → Orthogonal Projections → Orthogonal Projections and Least Squares Approximation → Linear Regression and Least Squares Estimation → Introduction to Multiple Linear Regression → Assumptions in Linear Regression → Regression Diagnostics and Residual Analysis

Longest path: 85 steps · 330 total prerequisite topics

Prerequisites (2)

Assumptions in Linear Regressionhard Least Squares Estimationhard

Leads To (2)

Graphical Diagnostics: Residual Plots and QQ Plotshard Inference in Linear Regressionhard