Explain why the representer theorem makes kernel methods computationally tractable despite working in potentially infinite-dimensional function spaces.
Think about your answer, then reveal below.
Model answer: Without the representer theorem, optimizing over an RKHS would require searching an infinite-dimensional space — computationally impossible. The theorem restricts the search to the n-dimensional subspace spanned by kernel functions at the training points, reducing the problem to finding n coefficients alpha_1, ..., alpha_n. The optimization then involves only the n-by-n kernel matrix K_{ij} = k(x_i, x_j), which is finite and computable. For kernel ridge regression, the solution is alpha = (K + lambda*I)^{-1} * y, a standard linear algebra problem. The computational cost is O(n^3) for the matrix inversion, independent of the RKHS dimension. This is the practical miracle of kernel methods: the infinite-dimensional function space collapses to an n-dimensional problem, and the kernel trick means you never need to compute the (possibly infinite-dimensional) feature vectors explicitly.
The O(n^3) cost is also the main limitation of kernel methods for large datasets. Methods like Nystrom approximation and random Fourier features address this by approximating the kernel matrix, but the representer theorem remains the foundational result explaining why exact kernel methods are possible at all.