← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Orthogonal Projections and Least Squares Approximation

College Depth 80 in the knowledge graph ☐ I know this ☆ Set as goal

928topics build on this

325prerequisites beneath it

See this on the map →

Gram-Schmidt Process and QR Decomposition Orthogonal Projections→→Linear Regression and Least Squares Estimation

Core Idea

The orthogonal projection of b onto a subspace W is proj_W(b), the point in W closest to b. For orthonormal basis {u₁, ..., uₖ}, proj_W(b) = Σ⟨b,uᵢ⟩uᵢ. For subspace spanned by columns of A, proj_W(b) = A(AᵀA)⁻¹Aᵀb. Least squares minimizes ||Ax − b||²; the optimal solution x* satisfies the normal equations AᵀAx* = Aᵀb, found via projection.

Explainer

From Gram-Schmidt, you know how to convert a basis into an orthonormal basis — a set of mutually perpendicular unit vectors. Orthogonal projections are what makes those orthonormal bases so powerful. The idea is geometric: given a vector b and a subspace W, the orthogonal projection proj_W(b) is the unique point in W that is closest to b. "Closest" means the error vector b − proj_W(b) is perpendicular to every vector in W.

When W has an orthonormal basis {u₁, ..., u_k}, the projection formula is remarkably clean: proj_W(b) = Σ⟨b, uᵢ⟩uᵢ. Each term ⟨b, uᵢ⟩uᵢ is the shadow of b onto one basis direction, and the full projection just sums these shadows. This works because orthonormality decouples the directions — there is no "cross-talk" between basis vectors, so you can handle each coordinate independently. This is exactly what Gram-Schmidt was buying you all along.

Least squares is what happens when you want to solve Ax = b but no exact solution exists — the right-hand side b lies outside the column space of A. Since you cannot hit b exactly, the best you can do is find the x that makes Ax as close to b as possible. The closest point in the column space of A to b is exactly the orthogonal projection of b onto that column space. The minimizer x* satisfies the normal equations AᵀAx* = Aᵀb, which you obtain by projecting b onto col(A). When A has linearly independent columns, AᵀA is invertible and x* = (AᵀA)⁻¹Aᵀb uniquely.

The matrix P = A(AᵀA)⁻¹Aᵀ is called the projection matrix (or hat matrix in statistics). It satisfies P² = P (applying the projection twice gives the same result) and Pᵀ = P (it is symmetric). These two properties — idempotent and symmetric — completely characterize orthogonal projection matrices. Any time you see a matrix satisfying P² = P and Pᵀ = P, you know it is projecting onto some subspace. Least squares is ubiquitous: it underlies linear regression, Fourier series approximation, and signal processing, wherever you need the best approximation to something you cannot represent exactly.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Vectors in Two Dimensions → Vector Operations: Addition, Subtraction, and Scalar Multiplication → Dot Product (Inner Product in R^n) → Inner Product Spaces → Orthogonality → Orthogonal Projections → Orthogonal Projections and Least Squares Approximation

Longest path: 81 steps · 325 total prerequisite topics

Prerequisites (2)

Gram-Schmidt Process and QR Decompositionhard Orthogonal Projectionssoft

Leads To (1)

Linear Regression and Least Squares Estimationhard