A 1000×500 matrix A has 50 nonzero singular values, the rest being zero. What does this immediately tell you about A?
AA is invertible, since it has more rows than columns
BA has rank 50 — it maps its 500-dimensional input space into a 50-dimensional subspace
CA is numerically unstable because its condition number is 500/50 = 10
DA can only be decomposed if it is first converted to a square matrix
The rank of a matrix equals the number of nonzero singular values. So this matrix, despite having 500 columns, only has rank 50 — its column space is 50-dimensional, not 500-dimensional. The other 450 dimensions are in the null space. Option A is wrong because a 1000×500 matrix cannot be square-invertible regardless of rank. Option C confuses condition number (σ₁/σₙ, not total/nonzero count). Option D is wrong — SVD works for any matrix without modification.
Question 2 Multiple Choice
You want the best rank-3 approximation to a 100×100 image matrix A. SVD gives you singular values σ₁ ≥ σ₂ ≥ ... ≥ σ₁₀₀. Which approximation is mathematically optimal?
AA₃ = σ₁u₁v₁ᵀ + σ₂u₂v₂ᵀ + σ₃u₃v₃ᵀ, using the three largest singular values
BA₃ = σ₉₈u₉₈v₉₈ᵀ + σ₉₉u₉₉v₉₉ᵀ + σ₁₀₀u₁₀₀v₁₀₀ᵀ, using the three smallest singular values
CThe average of all rank-1 terms: (1/100)∑σᵢuᵢvᵢᵀ
DThe choice depends on the application — SVD does not define a canonical best approximation
Keeping the k largest singular values and their corresponding outer products gives the best rank-k approximation to A in both the Frobenius and spectral norms — this is the Eckart–Young theorem. The largest singular values correspond to the directions of greatest variance in A; discarding the small ones loses the least information. This is the mathematical foundation of PCA, image compression, and recommender systems — you keep the 'big pieces' and throw away noise.
Question 3 True / False
SVD can be applied to any matrix — rectangular or square, symmetric or not — whereas eigendecomposition requires a square matrix.
TTrue
FFalse
Answer: True
This is SVD's key advantage over eigendecomposition. Eigendecomposition A = PΛP⁻¹ requires A to be square and, for a real orthogonal factorization, symmetric. SVD A = UΣVᵀ works for any m×n matrix: U is m×m orthogonal, Σ is m×n diagonal with nonneg entries, Vᵀ is n×n orthogonal. This generality — combined with numerical stability — is why SVD is the decomposition of choice in applications like least squares, PCA, and pseudoinverse computation.
Question 4 True / False
The singular values of a matrix A are the same as the eigenvalues of A.
TTrue
FFalse
Answer: False
Singular values and eigenvalues are related but distinct. Singular values σᵢ are the square roots of the eigenvalues of AᵀA (or AAᵀ), which are always nonneg. Eigenvalues of A itself can be negative, complex, or zero in ways unrelated to singular values. For a symmetric positive definite matrix, singular values and eigenvalues coincide — but in general they differ. Confusing the two leads to errors when assessing numerical stability (condition number uses singular values, not eigenvalues).
Question 5 Short Answer
Why is SVD described geometrically as 'rotation, then scaling, then rotation,' and why does this interpretation matter?
Think about your answer, then reveal below.
Model answer: Any linear map A = UΣVᵀ can be decomposed as: Vᵀ rotates the input, Σ stretches or shrinks along each axis (axis-aligned scaling), and U rotates the output. This matters because it means every linear transformation — no matter how complex — is secretly just these three operations. The singular values (the scaling factors) reveal how much the map amplifies each direction, which tells you the rank (how many directions have nonzero scaling), the condition number (ratio of largest to smallest nonzero scaling), and which directions to keep for a low-rank approximation.
The geometric decomposition makes SVD interpretable rather than just computational. It explains why the best rank-k approximation keeps the k largest singular values — those are the k directions in which A 'stretches most,' carrying the most information. And it generalizes the eigendecomposition's 'rotate-scale-unrotate' story to non-symmetric and rectangular matrices by allowing the two rotations to be different.