A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Least Squares Regression: Fundamentals and Derivation

Graduate Depth 116 in the knowledge graph ☐ I know this ☆ Set as goal

12topics build on this

658prerequisites beneath it

Classical OLS Assumptions (Gauss-Markov)Least Squares Approximation and Normal Equations +10 more→→Estimator Properties: Consistency, Unbiasedness, and Efficiency Gauss-Markov Theorem and OLS Efficiency +1 more

Core Idea

Ordinary least squares (OLS) minimizes the sum of squared residuals to estimate regression coefficients. The OLS estimator has a closed-form solution and is the foundation of econometric analysis, with well-understood statistical properties that depend on assumptions about the data-generating process.

How It's Best Learned

Work through matrix-form derivations minimizing the sum of squared residuals. Compare OLS to other loss functions and see why quadratic loss leads to the least squares solution.

Common Misconceptions

OLS does not require normally distributed errors (normality is only needed for exact inference), and minimizing squared residuals alone does not ensure unbiasedness—additional assumptions about regressors are required.

Explainer

Ordinary least squares (OLS) finds the line (or hyperplane) through data that minimizes the total squared distance between observed outcomes and predicted values. You already know from bivariate regression that this produces a fitted line ŷ = β₀ + β₁x; what this topic adds is the formal derivation in matrix notation and a deeper understanding of why squared residuals — rather than absolute values or fourth powers — are the natural loss function to minimize.

The matrix setup replaces a column of numbers with compact notation. Stack all your observations into an n×k matrix X (rows are observations, columns are variables including a constant), stack your outcomes into an n×1 vector y, and write the model as y = Xβ + ε. The OLS objective is to choose β to minimize the scalar (y − Xβ)'(y − Xβ) — the sum of squared residuals. Taking the derivative with respect to β and setting it to zero gives the normal equations: X'Xβ = X'y. Solving these yields the OLS estimator: β̂ = (X'X)⁻¹X'y. This closed-form solution is what makes OLS analytically tractable — many other estimators require iterative numerical methods.

The geometric interpretation, which your linear algebra prerequisite prepared you for, is illuminating. OLS projects the vector y onto the column space of X. The fitted values ŷ = Xβ̂ are the orthogonal projection of y onto that subspace, and the residuals ê = y − ŷ are perpendicular to every column of X (X'ê = 0 by construction). This orthogonality condition is not just a mathematical curiosity — it is the foundation for understanding what the OLS assumptions actually require. When the OLS assumptions hold, this projection has desirable properties; when they fail, the projection is still well-defined geometrically, but the statistical interpretation breaks down.

A critical point that addresses a common misconception: minimizing squared residuals *mechanically* always produces a solution, but that solution has good statistical properties (unbiasedness, consistency) only when the assumptions from your OLS prerequisite are satisfied — especially that the regressors are uncorrelated with the error term (E[X'ε] = 0). OLS is a procedure; the Gauss-Markov theorem (which this topic builds toward) is the theorem that says, *given* those assumptions, OLS is the Best Linear Unbiased Estimator. You can run OLS on any data. Whether the coefficients mean what you think they mean depends entirely on whether the world cooperates with the assumptions.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Independence of Events → Sampling Distributions → Standard Error of Estimators → Hypothesis Testing: Framework and Logic → P-values and Statistical Significance → Effect Size and Practical Significance → Hypothesis Testing: Framework and Logic → Z-Tests and T-Tests for Means → One-Sample Z-Test for Means → One-Sample and Two-Sample T-Tests → Inference in Linear Regression → Prediction Intervals in Regression → Linear Regression Basics → Residuals and Goodness of Fit (R²) → Simple (Bivariate) OLS Regression → Classical OLS Assumptions (Gauss-Markov) → Multiple Regression → Interpreting Regression Coefficients → Hypothesis Testing in Regression → F-Test and Joint Significance → White Test and Detection of Heteroskedasticity → Generalized Least Squares (GLS) for Non-Spherical Errors → Weighted Least Squares (WLS) → Least Squares Regression: Fundamentals and Derivation

Longest path: 117 steps · 658 total prerequisite topics

Prerequisites (12)

Simple (Bivariate) OLS Regressionhard Classical OLS Assumptions (Gauss-Markov)hard Linear Transformationshard Matrix Operationshard Least Squares Approximation and Normal Equationshard Optimization in Multiple Variablessoft Matrix Representation of Linear Transformationssoft Linear Regression and Least Squares Estimationsoft Interaction Terms in Regressionsoft Polynomial Regression and Nonlinear Functional Formssoft Feasible GLS (FGLS) with Estimated Covariance Structuresoft Weighted Least Squares (WLS)soft

Leads To (3)

Estimator Properties: Consistency, Unbiasedness, and Efficiencysoft Gauss-Markov Theorem and OLS Efficiencyhard Simple Linear Regression Estimationhard