The spectral theorem decomposes a compact self-adjoint operator into eigenvalues and eigenvectors, but the eigenvectors are abstract elements of . When the operator is the integral operator of a continuous kernel, the decomposition becomes a concrete and uniformly convergent series of continuous functions, the content of Mercer's theorem. This post makes the spectral theorem explicit for kernels and uses the same eigen-data to build the reproducing kernel Hilbert space, the setting that grounds the Karhunen-Loeve expansion and the kernel methods of learning theory [@reedSimon1980; @conway1990]. The domain is a compact interval with Lebesgue measure, and .
#The integral operator
A kernel is a continuous function . It acts on by
and the structure of transfers directly to .
A kernel is symmetric when and positive when for all .
For a symmetric kernel, is a compact self-adjoint operator on , and it is positive when is positive.
The kernel is bounded on the compact square , so , which makes a Hilbert-Schmidt operator. Fix an orthonormal basis of . The products form an orthonormal basis of , so with . Truncating the expansion to gives a kernel whose operator has finite-dimensional range, and , the operator norm being bounded by the Hilbert-Schmidt norm. So is a norm limit of finite-rank operators, hence compact. Self-adjointness is by symmetry and Fubini, and positivity holds by hypothesis on .
By the spectral theorem and its positive corollary, a symmetric positive kernel gives an orthonormal sequence of eigenfunctions with eigenvalues tending to and .
#Continuity of the eigenfunctions
The eigenfunctions are a priori only classes. For nonzero eigenvalues they have continuous representatives, because the operator smooths.
The operator maps into the continuous functions, and every eigenfunction with has a continuous representative, namely .
The kernel is uniformly continuous on the compact square, so for there is with whenever , uniformly in . Then by Cauchy-Schwarz
so is continuous. Since lies in the range of , the function is a continuous representative of when .
From here denotes the continuous representative, and , are genuine pointwise values.
#Mercer's theorem
Let be a continuous symmetric positive kernel on . Then
the series converging absolutely for each and uniformly on .
Write for the partial sums.
Convergence in . By the spectral theorem the closure of the range of is the closed span of , and self-adjointness gives , so any direction orthogonal to all lies in . Extend by an orthonormal basis of to a complete orthonormal basis of ; then is a complete orthonormal basis of . The coefficient vanishes unless with , in which case it equals . Hence in , so in .
The diagonal is bounded. The remainder has integral operator , which is positive because . A continuous positive kernel is nonnegative on the diagonal, since a strictly negative value would persist on a square by continuity and, with , make . Thus for all and , and the increasing sums converge pointwise to a limit .
Convergence in for fixed . For , Cauchy-Schwarz on the increment gives
where and the second factor uses the diagonal bound at . Since , the sums are uniformly Cauchy in , converging uniformly to a continuous function with .
A pointwise inequality. The remainder is a continuous kernel with positive operator, so its values obey for all . To see this, evaluate the positivity on for intervals , shrinking to their centres. Continuity sends the quadratic form to for all real , and a positive semidefinite binary form has nonnegative discriminant, which is the inequality.
Identification. For each the uniform limit gives uniformly in , with continuous. Separately, in forces, along a subsequence, in for almost every , so on a set of full measure, which is dense. For the a.e.- equality between continuous functions holds everywhere, so at we get , whence the diagonal . Fixing any and any , the diagonal bound keeps the first factor bounded, so the pointwise inequality gives , so . The continuous thus vanishes on the dense , hence everywhere, giving for every . So for all , and in particular .
Uniformity. The continuous increasing sums now converge to the continuous limit on the compact interval , so by Dini's theorem the convergence is uniform in . The bound Equation (4) then has its first factor uniformly small and its second bounded by , making uniformly on . The same bound with absolute values term by term gives absolute convergence at each point.
The diagonal of Equation (3) integrates to the trace identity.
.
Integrate over . Uniform convergence permits term-by-term integration, and leaves .
#Reproducing kernels
Mercer's eigen-data assembles a Hilbert space of functions in which does the evaluating.
The reproducing kernel Hilbert space of a positive kernel is
with inner product . The sum runs over the nonzero-eigenvalue modes, so the norm is definite.
Writing from Mercer, its coordinates are , so , putting every in . The defining property follows by matching coordinates.
For every and every , the reproducing property holds.
The coordinates of are , so , the last equality being the expansion of evaluated at , which converges absolutely since by Cauchy-Schwarz.
The map is the feature map, embedding the index set into so that the kernel is the inner product of features, , the identity at the root of every kernel method. Mercer's theorem is what makes this embedding concrete, turning the abstract spectral decomposition of an integral operator into an explicit basis of continuous features. Applied to a covariance kernel, the eigenfunctions are the principal modes of the process and the eigenvalues their variances, and the uniform series Equation (3) is precisely the covariance side of the Karhunen-Loeve expansion, the representation in which a process becomes a sum of independent coordinates.