The reproducing property states f(x) = <f, k(x, ·)> for any f in the RKHS. What does this property actually guarantee that a generic Hilbert space of functions does not?
AIt guarantees that all functions in the space are differentiable
BIt guarantees that evaluating a function at a point is a bounded (continuous) operation — small changes to f produce small changes to f(x) — which fails in spaces like L^2 where functions are only defined up to measure-zero sets
CIt guarantees that the kernel function is unique for each RKHS
DIt guarantees that the inner product can be computed in closed form
In L^2 (square-integrable functions), you cannot meaningfully evaluate a function at a single point — changing a function on a set of measure zero does not change its L^2 norm, so 'the value at point x' is not well-defined as a continuous functional. In an RKHS, point evaluation IS a bounded linear functional: |f(x)| <= ||f|| * ||k(x,·)||, meaning the function value at x is controlled by the RKHS norm of f. This is the reproducing property, and it is precisely what makes RKHS functions suitable for learning — we need to evaluate predictions at specific test points.
Question 2 True / False
Every positive definite kernel defines a unique RKHS, and every RKHS has a unique reproducing kernel.
TTrue
FFalse
Answer: True
This is the Moore-Aronszajn theorem: there is a one-to-one correspondence between positive definite kernels and RKHS. Given a positive definite kernel k, there exists a unique RKHS H_k for which k is the reproducing kernel. Conversely, given an RKHS, its reproducing kernel is uniquely determined. This bijection is fundamental — it means choosing a kernel is exactly the same as choosing a function space with a particular geometry. The RBF kernel defines one RKHS (of smooth functions), the polynomial kernel defines another (of polynomial functions), and each has different properties for learning.
Question 3 True / False
The RKHS norm ||f|| measures function complexity in a way that directly relates to generalization. A function with small RKHS norm is guaranteed to have small pointwise values.
TTrue
FFalse
Answer: True
By the reproducing property, |f(x)| = |<f, k(x,·)>| <= ||f|| * ||k(x,·)|| = ||f|| * sqrt(k(x,x)) by the Cauchy-Schwarz inequality. So if the kernel is bounded (k(x,x) <= K for all x, as with the RBF kernel where k(x,x) = 1), then |f(x)| <= K * ||f||. Functions with small RKHS norm are 'simple' — they have bounded pointwise values and are smooth in a sense defined by the kernel. This is why penalizing RKHS norm (as in kernel ridge regression or SVMs) is a principled regularizer: it constrains the learned function to be simple in the geometry defined by the kernel.
Question 4 Short Answer
Explain why Mercer's theorem is important for connecting the abstract RKHS theory to the practical kernel trick used in algorithms like SVMs.
Think about your answer, then reveal below.
Model answer: Mercer's theorem states that any continuous positive definite kernel k(x,y) on a compact domain can be decomposed as k(x,y) = sum_i lambda_i * phi_i(x) * phi_i(y), where lambda_i are non-negative eigenvalues and phi_i are orthonormal eigenfunctions. This decomposition provides the explicit feature map: each input x maps to the (possibly infinite-dimensional) vector (sqrt(lambda_1)*phi_1(x), sqrt(lambda_2)*phi_2(x), ...), and the kernel evaluation k(x,y) equals the inner product of these feature vectors. This connects the abstract RKHS (a space of functions defined by the kernel) to the concrete feature-map picture (the kernel computes dot products in a feature space). Without Mercer's theorem, the claim that 'kernels implicitly compute inner products in a feature space' would lack rigorous justification.
Mercer's theorem also reveals the spectral structure of the kernel — the eigenvalue decay rate determines the effective dimensionality of the feature space and the smoothness properties of functions in the RKHS. Fast eigenvalue decay means the RKHS contains only smooth functions; slow decay allows rougher functions.