Mercer's Theorem and Reproducing Kernels

The spectral theorem decomposes a compact self-adjoint operator into eigenvalues and eigenvectors, but the eigenvectors are abstract elements of $L^2$ . When the operator is the integral operator of a continuous kernel, the decomposition becomes a concrete and uniformly convergent series of continuous functions, the content of Mercer's theorem. This post makes the spectral theorem explicit for kernels and uses the same eigen-data to build the reproducing kernel Hilbert space, the setting that grounds the Karhunen-Loeve expansion and the kernel methods of learning theory [@reedSimon1980; @conway1990]. The domain is a compact interval $[a,b]$ with Lebesgue measure, and $L^2=L^2([a,b])$ .

#The integral operator

A kernel is a continuous function $K:[a,b]^2\to\R$ . It acts on $L^2$ by

(T_K f)(s)=\int_a^b K(s,t)f(t)\,dt, \tag{1}

and the structure of $K$ transfers directly to $T_K$ .

Definition1

A kernel $K$ is symmetric when $K(s,t)=K(t,s)$ and positive when $\ip{T_K f}{f}=\int\!\!\int K(s,t)f(s)f(t)\,ds\,dt\ge 0$ for all $f\in L^2$ .

Proposition2

For a symmetric kernel, $T_K$ is a compact self-adjoint operator on $L^2$ , and it is positive when $K$ is positive.

Proof

The kernel is bounded on the compact square $[a,b]^2$ , so $\int\!\!\int K^2<\infty$ , which makes $T_K$ a Hilbert-Schmidt operator. Fix an orthonormal basis $(\psi_i)$ of $L^2$ . The products $(\psi_i(s)\psi_j(t))_{i,j}$ form an orthonormal basis of $L^2([a,b]^2)$ , so $K=\sum_{i,j} c_{ij}\psi_i(s)\psi_j(t)$ with $\sum_{i,j}c_{ij}^2=\int\!\!\int K^2<\infty$ . Truncating the expansion to $i,j\le N$ gives a kernel whose operator $F_N$ has finite-dimensional range, and $\norm{T_K-F_N}^2\le \int\!\!\int(K-K_N)^2=\sum_{i\text{ or }j>N}c_{ij}^2\to 0$ , the operator norm being bounded by the Hilbert-Schmidt norm. So $T_K$ is a norm limit of finite-rank operators, hence compact. Self-adjointness is $\ip{T_K f}{g}=\int\!\!\int K(s,t)f(t)g(s)=\ip{f}{T_K g}$ by symmetry and Fubini, and positivity holds by hypothesis on $K$ .

By the spectral theorem and its positive corollary, a symmetric positive kernel gives an orthonormal sequence of eigenfunctions $\varphi_n\in L^2$ with eigenvalues $\lambda_n\ge 0$ tending to $0$ and $T_K\varphi_n=\lambda_n\varphi_n$ .

#Continuity of the eigenfunctions

The eigenfunctions are a priori only $L^2$ classes. For nonzero eigenvalues they have continuous representatives, because the operator smooths.

Lemma3

The operator $T_K$ maps $L^2$ into the continuous functions, and every eigenfunction with $\lambda_n>0$ has a continuous representative, namely $\varphi_n=\lambda_n^{-1}T_K\varphi_n$ .

Proof

The kernel is uniformly continuous on the compact square, so for $\varepsilon>0$ there is $\delta>0$ with $\abs{K(s,t)-K(s',t)}<\varepsilon$ whenever $\abs{s-s'}<\delta$ , uniformly in $t$ . Then by Cauchy-Schwarz

\abs{(T_K f)(s)-(T_K f)(s')}\le\int_a^b\abs{K(s,t)-K(s',t)}\abs{f(t)}\,dt\le\varepsilon\sqrt{b-a}\, \norm f_2, \tag{2}

so $T_K f$ is continuous. Since $\lambda_n\varphi_n=T_K\varphi_n$ lies in the range of $T_K$ , the function $\lambda_n^{-1}T_K\varphi_n$ is a continuous representative of $\varphi_n$ when $\lambda_n>0$ .

From here $\varphi_n$ denotes the continuous representative, and $K(s,s)$ , $\varphi_n(s)$ are genuine pointwise values.

#Mercer's theorem

Theorem4

Let $K$ be a continuous symmetric positive kernel on $[a,b]^2$ . Then

K(s,t)=\sum_{n}\lambda_n\varphi_n(s)\varphi_n(t), \tag{3}

the series converging absolutely for each $(s,t)$ and uniformly on $[a,b]^2$ .

Proof

Write $S_N(s,t)=\sum_{n\le N}\lambda_n\varphi_n(s)\varphi_n(t)$ for the partial sums.

Convergence in $L^2$ . By the spectral theorem the closure of the range of $T_K$ is the closed span of $\{\varphi_n:\lambda_n>0\}$ , and self-adjointness gives $(\overline{\operatorname{ran}T_K})^\perp=\ker T_K$ , so any direction orthogonal to all $\varphi_n$ lies in $\ker T_K$ . Extend $\{\varphi_n:\lambda_n>0\}$ by an orthonormal basis of $\ker T_K$ to a complete orthonormal basis $(u_i)$ of $L^2$ ; then $(u_i(s)u_j(t))$ is a complete orthonormal basis of $L^2([a,b]^2)$ . The coefficient $\ip{K}{u_i\otimes u_j}=\ip{T_K u_j}{u_i}$ vanishes unless $u_j=\varphi_n$ with $\lambda_n>0$ , in which case it equals $\lambda_n\delta_{ij}$ . Hence $K=\sum_{n:\lambda_n>0}\lambda_n\varphi_n\otimes\varphi_n$ in $L^2([a,b]^2)$ , so $S_N\to K$ in $L^2$ .

The diagonal is bounded. The remainder $K-S_N$ has integral operator $T_K-\sum_{n\le N}\lambda_n \varphi_n\otimes\varphi_n$ , which is positive because $\ip{(T_K-\sum_{n\le N}\lambda_n\varphi_n\otimes \varphi_n)f}{f}=\sum_{n>N}\lambda_n\ip{f}{\varphi_n}^2\ge 0$ . A continuous positive kernel is nonnegative on the diagonal, since a strictly negative value $K_N(s_0,s_0)<0$ would persist on a square $[s_0-\eta,s_0+\eta]^2$ by continuity and, with $g=\mathbf 1_{[s_0-\eta,s_0+\eta]}$ , make $\ip{T_{K_N}g}{g}=\int\!\!\int_{[s_0-\eta,s_0+\eta]^2}K_N(s,t)\,ds\,dt<0$ . Thus $\sum_{n\le N}\lambda_n\varphi_n(s)^2\le K(s,s)$ for all $s$ and $N$ , and the increasing sums converge pointwise to a limit $g(s)\le K(s,s)$ .

Convergence in $t$ for fixed $s$ . For $m>N$ , Cauchy-Schwarz on the increment gives

\abs{S_m(s,t)-S_N(s,t)}\le\Big(\sum_{N<n\le m}\lambda_n\varphi_n(s)^2\Big)^{1/2}\Big(\sum_{N<n\le m} \lambda_n\varphi_n(t)^2\Big)^{1/2}\le\big(g(s)-S_N^{\mathrm d}(s)\big)^{1/2}K(t,t)^{1/2}, \tag{4}

where $S_N^{\mathrm d}(s)=\sum_{n\le N}\lambda_n\varphi_n(s)^2$ and the second factor uses the diagonal bound at $t$ . Since $S_N^{\mathrm d}(s)\to g(s)$ , the sums $S_N(s,\cdot)$ are uniformly Cauchy in $t$ , converging uniformly to a continuous function $F_s$ with $F_s(t)=\sum_n\lambda_n\varphi_n(s)\varphi_n(t)$ .

A pointwise inequality. The remainder $K_N=K-S_N$ is a continuous kernel with positive operator, so its values obey $K_N(s,t)^2\le K_N(s,s)K_N(t,t)$ for all $s,t$ . To see this, evaluate the positivity $\ip{T_{K_N}f}{f}\ge 0$ on $f=\alpha\mathbf 1_I/\abs I+\beta\mathbf 1_J/\abs J$ for intervals $I\ni s$ , $J\ni t$ shrinking to their centres. Continuity sends the quadratic form to $\alpha^2 K_N(s,s)+2\alpha \beta K_N(s,t)+\beta^2 K_N(t,t)\ge 0$ for all real $\alpha,\beta$ , and a positive semidefinite binary form has nonnegative discriminant, which is the inequality.

Identification. For each $s$ the uniform limit gives $K_N(s,\cdot)=K(s,\cdot)-S_N(s,\cdot)\to K(s,\cdot) -F_s=:h_s$ uniformly in $t$ , with $h_s$ continuous. Separately, $S_N\to K$ in $L^2([a,b]^2)$ forces, along a subsequence, $S_N(s,\cdot)\to K(s,\cdot)$ in $L^2(t)$ for almost every $s$ , so $h_s=0$ on a set $G$ of full measure, which is dense. For $s'\in G$ the a.e.- $t$ equality $F_{s'}=K(s',\cdot)$ between continuous functions holds everywhere, so at $t=s'$ we get $g(s')=F_{s'}(s')=K(s',s')$ , whence the diagonal $K_N(s',s')=K(s',s')-S_N^{\mathrm d}(s')\to K(s',s')-g(s')=0$ . Fixing any $s_0$ and any $s'\in G$ , the diagonal bound $0\le K_N(s_0,s_0)\le K(s_0,s_0)$ keeps the first factor bounded, so the pointwise inequality gives $K_N(s_0,s')^2\le K_N(s_0,s_0)K_N(s',s')\le K(s_0,s_0)K_N(s',s')\to 0$ , so $h_{s_0}(s')=0$ . The continuous $h_{s_0}$ thus vanishes on the dense $G$ , hence everywhere, giving $K(s_0,\cdot)=F_{s_0}$ for every $s_0$ . So $K(s,t)=\sum_n\lambda_n\varphi_n(s)\varphi_n(t)$ for all $(s,t)$ , and in particular $g(s)=K(s,s)$ .

Uniformity. The continuous increasing sums $S_N^{\mathrm d}(s)$ now converge to the continuous limit $K(s,s)$ on the compact interval $[a,b]$ , so by Dini's theorem the convergence is uniform in $s$ . The bound Equation (4) then has its first factor uniformly small and its second bounded by $\max_t K(t,t)^{1/2}$ , making $S_N\to K$ uniformly on $[a,b]^2$ . The same bound with absolute values term by term gives absolute convergence at each point.

The diagonal of Equation (3) integrates to the trace identity.

Corollary5

$\int_a^b K(s,s)\,ds=\sum_n\lambda_n$ .

Proof

Integrate $K(s,s)=\sum_n\lambda_n\varphi_n(s)^2$ over $[a,b]$ . Uniform convergence permits term-by-term integration, and $\int\varphi_n^2=1$ leaves $\sum_n\lambda_n$ .

#Reproducing kernels

Mercer's eigen-data assembles a Hilbert space of functions in which $K$ does the evaluating.

Definition6

The reproducing kernel Hilbert space of a positive kernel $K$ is

\mathcal H_K=\Big\{f=\sum_{n:\lambda_n>0} a_n\varphi_n:\ \norm f_{\mathcal H_K}^2=\sum_{n:\lambda_n>0} \frac{a_n^2}{\lambda_n}<\infty\Big\}, \tag{5}

with inner product $\ip{f}{g}_{\mathcal H_K}=\sum_{n:\lambda_n>0} a_n b_n/\lambda_n$ . The sum runs over the nonzero-eigenvalue modes, so the norm is definite.

Writing $K_s=K(s,\cdot)=\sum_n\lambda_n\varphi_n(s)\varphi_n$ from Mercer, its coordinates are $a_n=\lambda_n\varphi_n(s)$ , so $\norm{K_s}_{\mathcal H_K}^2=\sum_n\lambda_n\varphi_n(s)^2=K(s,s)<\infty$ , putting every $K_s$ in $\mathcal H_K$ . The defining property follows by matching coordinates.

Proposition7

For every $f=\sum_{n:\lambda_n>0} a_n\varphi_n\in\mathcal H_K$ and every $s$ , the reproducing property $f(s)=\ip{f}{K_s}_{\mathcal H_K}$ holds.

Proof

The coordinates of $K_s$ are $\lambda_n\varphi_n(s)$ , so $\ip{f}{K_s}_{\mathcal H_K}=\sum_{n:\lambda_n>0} a_n\cdot\lambda_n\varphi_n(s)/\lambda_n=\sum_{n:\lambda_n>0} a_n\varphi_n(s)=f(s)$ , the last equality being the expansion of $f$ evaluated at $s$ , which converges absolutely since $\sum_{n:\lambda_n>0}\abs{a_n \varphi_n(s)}\le\norm f_{\mathcal H_K}K(s,s)^{1/2}$ by Cauchy-Schwarz.

The map $s\mapsto K_s$ is the feature map, embedding the index set into $\mathcal H_K$ so that the kernel is the inner product of features, $\ip{K_s}{K_t}_{\mathcal H_K}=K(s,t)$ , the identity at the root of every kernel method. Mercer's theorem is what makes this embedding concrete, turning the abstract spectral decomposition of an integral operator into an explicit basis of continuous features. Applied to a covariance kernel, the eigenfunctions $\varphi_n$ are the principal modes of the process and the eigenvalues $\lambda_n$ their variances, and the uniform series Equation (3) is precisely the covariance side of the Karhunen-Loeve expansion, the representation in which a process becomes a sum of independent coordinates.

Explore connections

see in the atlas

referenced by (2)

cite

@misc{mercer-and-rkhs,
  author = {Zac Kienzle},
  title  = {Mercer's Theorem and Reproducing Kernels},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/mercer-and-rkhs}
}

#The integral operator

A kernel is a continuous function $K:[a,b]^2\to\R$ . It acts on $L^2$ by

(T_K f)(s)=\int_a^b K(s,t)f(t)\,dt, \tag{1}

and the structure of $K$ transfers directly to $T_K$ .

Definition1

A kernel $K$ is symmetric when $K(s,t)=K(t,s)$ and positive when $\ip{T_K f}{f}=\int\!\!\int K(s,t)f(s)f(t)\,ds\,dt\ge 0$ for all $f\in L^2$ .

Proposition2

For a symmetric kernel, $T_K$ is a compact self-adjoint operator on $L^2$ , and it is positive when $K$ is positive.

Proof

#Continuity of the eigenfunctions

The eigenfunctions are a priori only $L^2$ classes. For nonzero eigenvalues they have continuous representatives, because the operator smooths.

Lemma3

The operator $T_K$ maps $L^2$ into the continuous functions, and every eigenfunction with $\lambda_n>0$ has a continuous representative, namely $\varphi_n=\lambda_n^{-1}T_K\varphi_n$ .

Proof

\abs{(T_K f)(s)-(T_K f)(s')}\le\int_a^b\abs{K(s,t)-K(s',t)}\abs{f(t)}\,dt\le\varepsilon\sqrt{b-a}\, \norm f_2, \tag{2}

From here $\varphi_n$ denotes the continuous representative, and $K(s,s)$ , $\varphi_n(s)$ are genuine pointwise values.

#Mercer's theorem

Theorem4

Let $K$ be a continuous symmetric positive kernel on $[a,b]^2$ . Then

K(s,t)=\sum_{n}\lambda_n\varphi_n(s)\varphi_n(t), \tag{3}

the series converging absolutely for each $(s,t)$ and uniformly on $[a,b]^2$ .

Proof

Write $S_N(s,t)=\sum_{n\le N}\lambda_n\varphi_n(s)\varphi_n(t)$ for the partial sums.

Convergence in $t$ for fixed $s$ . For $m>N$ , Cauchy-Schwarz on the increment gives

\abs{S_m(s,t)-S_N(s,t)}\le\Big(\sum_{N<n\le m}\lambda_n\varphi_n(s)^2\Big)^{1/2}\Big(\sum_{N<n\le m} \lambda_n\varphi_n(t)^2\Big)^{1/2}\le\big(g(s)-S_N^{\mathrm d}(s)\big)^{1/2}K(t,t)^{1/2}, \tag{4}

The diagonal of Equation (3) integrates to the trace identity.

Corollary5

$\int_a^b K(s,s)\,ds=\sum_n\lambda_n$ .

Proof

Integrate $K(s,s)=\sum_n\lambda_n\varphi_n(s)^2$ over $[a,b]$ . Uniform convergence permits term-by-term integration, and $\int\varphi_n^2=1$ leaves $\sum_n\lambda_n$ .

#Reproducing kernels

Mercer's eigen-data assembles a Hilbert space of functions in which $K$ does the evaluating.

Definition6

The reproducing kernel Hilbert space of a positive kernel $K$ is

\mathcal H_K=\Big\{f=\sum_{n:\lambda_n>0} a_n\varphi_n:\ \norm f_{\mathcal H_K}^2=\sum_{n:\lambda_n>0} \frac{a_n^2}{\lambda_n}<\infty\Big\}, \tag{5}

with inner product $\ip{f}{g}_{\mathcal H_K}=\sum_{n:\lambda_n>0} a_n b_n/\lambda_n$ . The sum runs over the nonzero-eigenvalue modes, so the norm is definite.

Proposition7

For every $f=\sum_{n:\lambda_n>0} a_n\varphi_n\in\mathcal H_K$ and every $s$ , the reproducing property $f(s)=\ip{f}{K_s}_{\mathcal H_K}$ holds.

Proof

Explore connections

see in the atlas

referenced by (2)

cite

@misc{mercer-and-rkhs,
  author = {Zac Kienzle},
  title  = {Mercer's Theorem and Reproducing Kernels},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/mercer-and-rkhs}
}