Questions: Regularization Theory (Tikhonov, Spectral)
4 questions to test your understanding
Score: 0 / 4
Question 1 Multiple Choice
Tikhonov regularization solves min_f (1/n)||y - Kf||^2 + lambda||f||^2, where K is the kernel matrix. In the eigendecomposition of K with eigenvalues sigma_i, what does the regularization parameter lambda do to each eigencomponent of the solution?
AIt sets eigencomponents with sigma_i < lambda to exactly zero, acting as a hard threshold
BIt shrinks each eigencomponent by a factor of sigma_i / (sigma_i + lambda), attenuating small eigenvalues more than large ones
CIt adds lambda to each eigenvalue uniformly, shifting the entire spectrum
DIt inverts the effect of eigenvalue decay, amplifying small eigenvalues to prevent information loss
The Tikhonov solution in the eigendecomposition is alpha_i = sigma_i / (sigma_i^2 + lambda) * (u_i^T y) / sigma_i = 1/(sigma_i + lambda) * (u_i^T y), or more precisely the filter factor is sigma_i / (sigma_i + lambda). For eigencomponents where sigma_i >> lambda, the filter is approximately 1 (no shrinkage). For sigma_i << lambda, the filter is approximately sigma_i/lambda, heavily attenuated. This is soft thresholding — all components are retained but small eigenvalues (corresponding to high-frequency, noisy directions) are suppressed. Option A describes truncated SVD (hard spectral regularization), which is a different spectral method.
Question 2 True / False
Increasing the Tikhonov regularization parameter lambda always increases the bias of the solution.
TTrue
FFalse
Answer: True
Larger lambda applies stronger shrinkage, pushing the solution toward zero (or toward the prior). This means the regularized solution deviates more from the unregularized solution, which would have lower approximation error if data were infinite. The bias introduced is the price of stabilization — the regularized solution is more biased but less sensitive to noise in the training data. In the limit lambda -> infinity, the solution is zero (maximum bias, zero variance). In the limit lambda -> 0, the solution approaches the unregularized one (minimum bias, maximum variance and instability). The optimal lambda balances these extremes.
Question 3 True / False
Spectral regularization methods (Tikhonov, truncated SVD, Landweber iteration) all operate by modifying the eigenvalues of the kernel matrix, but they differ in the shape of the filter function.
TTrue
FFalse
Answer: True
All spectral regularization methods can be characterized by a filter function g_lambda(sigma) applied to the eigenvalues sigma of the kernel matrix. Tikhonov uses g(sigma) = sigma/(sigma + lambda), a smooth decay. Truncated SVD uses g(sigma) = 1 if sigma > threshold, 0 otherwise — a hard cutoff. Landweber iteration (iterative regularization) uses g(sigma) = 1 - (1 - sigma)^t, which gradually includes more eigencomponents as iterations t increase. Each filter shape makes different tradeoffs between bias and stability, and the unifying eigenvalue perspective reveals them as a single family of methods parameterized by the filter function.
Question 4 Short Answer
Explain why learning from finite data is an ill-posed inverse problem and how Tikhonov regularization makes it well-posed.
Think about your answer, then reveal below.
Model answer: Learning from data is inverse because we observe outputs (labels) and must infer the function that produced them — the reverse of evaluation. It is ill-posed in Hadamard's sense: the solution does not depend continuously on the data. Concretely, small perturbations to the training labels can cause arbitrarily large changes in the learned function, especially along directions corresponding to small eigenvalues of the kernel matrix (where the inverse amplifies noise enormously). Tikhonov regularization adds lambda*||f||^2 to the objective, which prevents the solution from amplifying small-eigenvalue components. The resulting filter sigma/(sigma + lambda) damps the contribution of small eigenvalues, bounding the sensitivity of the solution to data perturbations. This makes the problem well-posed: the regularized solution depends continuously on the data, with the continuity constant controlled by lambda.
The analogy to matrix inversion is direct: inverting a near-singular matrix amplifies small singular values into enormous components. Adding lambda*I (Tikhonov regularization of the normal equations) makes the matrix well-conditioned. The same principle applies in function space.