Questions: Linear Regression and Least Squares Estimation
3 questions to test your understanding
Score: 0 / 3
Question 1 Multiple Choice
In the linear algebra view of regression, the fitted values ŷ = Xβ* are best described as:
AThe average of the observed response values
BThe orthogonal projection of y onto the column space of X
CThe maximum-likelihood estimate under any error distribution
DThe solution minimizing the sum of absolute residuals
The fitted values ŷ = Xβ* = X(XᵀX)⁻¹Xᵀy = Py, where P = X(XᵀX)⁻¹Xᵀ is the orthogonal projection matrix onto col(X). This projection interpretation is the core geometric insight: we find the point in the column space of X closest to y, and the error is the perpendicular distance from y to that subspace.
Question 2 True / False
Solving the normal equations (XᵀX)β = Xᵀy directly by inverting XᵀX is generally the preferred numerical method for computing β* in practice.
TTrue
FFalse
Answer: False
Inverting XᵀX squares the condition number of X, making the calculation numerically unstable when X is near-singular (e.g., when predictors are nearly collinear). QR decomposition of X directly is preferred because it works with the original condition number of X, not its square. Most regression software (R, NumPy's lstsq, etc.) uses QR or SVD internally.
Question 3 Short Answer
What geometric property must the residual vector r = y − Xβ* satisfy, and why does this property uniquely define the least-squares solution?
Think about your answer, then reveal below.
Model answer: The residuals must be orthogonal to every column of X — equivalently, Xᵀr = 0. This is the perpendicularity condition: the minimum-distance point from y to col(X) is the foot of the perpendicular from y to that subspace, so the error vector must point directly away from the subspace.
The condition Xᵀ(y − Xβ*) = 0 is exactly the normal equations rearranged. Any other coefficient vector β would produce a residual with a nonzero component inside col(X), meaning we could reduce ||y − Xβ||² by moving β in that direction — so β* would not be the minimizer. Orthogonality of residuals is both the geometric definition of the projection and the algebraic optimality condition.