Skip to content
homeaboutworkprojectsthesiswritingresume
Loading
~/blog/mercer-and-rkhs0%dark
  1. home/
  2. writing/
  3. Mercer's Theorem and Reproducing Kernels

02 June 2026 · 9 min read · updated 09 June 2026

Mercer's Theorem and Reproducing Kernels

A continuous positive-definite kernel defines a compact positive integral operator, and the spectral theorem decomposes it. We prove the integral operator is Hilbert-Schmidt and positive, prove its eigenfunctions are continuous, and prove Mercer's theorem that the kernel is the absolutely and uniformly convergent sum of its eigenfunctions weighted by eigenvalues. The same data builds the reproducing kernel Hilbert space, the function space in which the kernel evaluates pointwise through the inner product. This grounds the Karhunen-Loeve expansion and the kernel methods of learning theory.

  • 5 equations
  • 12 results
  • 8 connections
  • functional-analysis
  • hilbert-space
  • kernels
On this page▾
  • The integral operator
  • Continuity of the eigenfunctions
  • Mercer's theorem
  • Reproducing kernels

9 min left

  • The integral operator1m
  • Continuity of the eigenfunctions1m
  • Mercer's theorem5m
  • Reproducing kernels2m

The spectral theorem decomposes a compact self-adjoint operator into eigenvalues and eigenvectors, but the eigenvectors are abstract elements of L2L^2L2. When the operator is the integral operator of a continuous kernel, the decomposition becomes a concrete and uniformly convergent series of continuous functions, the content of Mercer's theorem. This post makes the spectral theorem explicit for kernels and uses the same eigen-data to build the reproducing kernel Hilbert space, the setting that grounds the Karhunen-Loeve expansion and the kernel methods of learning theory [@reedSimon1980; @conway1990]. The domain is a compact interval [a,b][a,b][a,b] with Lebesgue measure, and L2=L2([a,b])L^2=L^2([a,b])L2=L2([a,b]).

#The integral operator

A kernel is a continuous function K:[a,b]2→RK:[a,b]^2\to\RK:[a,b]2→R. It acts on L2L^2L2 by

(TKf)(s)=∫abK(s,t)f(t) dt,(1)(T_K f)(s)=\int_a^b K(s,t)f(t)\,dt, \tag{1}(TK​f)(s)=∫ab​K(s,t)f(t)dt,(1)

and the structure of KKK transfers directly to TKT_KTK​.

Definition1

A kernel KKK is symmetric when K(s,t)=K(t,s)K(s,t)=K(t,s)K(s,t)=K(t,s) and positive when ⟨TKf,f⟩=∫ ⁣ ⁣∫K(s,t)f(s)f(t) ds dt≥0\ip{T_K f}{f}=\int\!\!\int K(s,t)f(s)f(t)\,ds\,dt\ge 0⟨TK​f,f⟩=∫∫K(s,t)f(s)f(t)dsdt≥0 for all f∈L2f\in L^2f∈L2.

Proposition2

For a symmetric kernel, TKT_KTK​ is a compact self-adjoint operator on L2L^2L2, and it is positive when KKK is positive.

Proof

The kernel is bounded on the compact square [a,b]2[a,b]^2[a,b]2, so ∫ ⁣ ⁣∫K2<∞\int\!\!\int K^2<\infty∫∫K2<∞, which makes TKT_KTK​ a Hilbert-Schmidt operator. Fix an orthonormal basis (ψi)(\psi_i)(ψi​) of L2L^2L2. The products (ψi(s)ψj(t))i,j(\psi_i(s)\psi_j(t))_{i,j}(ψi​(s)ψj​(t))i,j​ form an orthonormal basis of L2([a,b]2)L^2([a,b]^2)L2([a,b]2), so K=∑i,jcijψi(s)ψj(t)K=\sum_{i,j} c_{ij}\psi_i(s)\psi_j(t)K=∑i,j​cij​ψi​(s)ψj​(t) with ∑i,jcij2=∫ ⁣ ⁣∫K2<∞\sum_{i,j}c_{ij}^2=\int\!\!\int K^2<\infty∑i,j​cij2​=∫∫K2<∞. Truncating the expansion to i,j≤Ni,j\le Ni,j≤N gives a kernel whose operator FNF_NFN​ has finite-dimensional range, and ∥TK−FN∥2≤∫ ⁣ ⁣∫(K−KN)2=∑i or j>Ncij2→0\norm{T_K-F_N}^2\le \int\!\!\int(K-K_N)^2=\sum_{i\text{ or }j>N}c_{ij}^2\to 0∥TK​−FN​∥2≤∫∫(K−KN​)2=∑i or j>N​cij2​→0, the operator norm being bounded by the Hilbert-Schmidt norm. So TKT_KTK​ is a norm limit of finite-rank operators, hence compact. Self-adjointness is ⟨TKf,g⟩=∫ ⁣ ⁣∫K(s,t)f(t)g(s)=⟨f,TKg⟩\ip{T_K f}{g}=\int\!\!\int K(s,t)f(t)g(s)=\ip{f}{T_K g}⟨TK​f,g⟩=∫∫K(s,t)f(t)g(s)=⟨f,TK​g⟩ by symmetry and Fubini, and positivity holds by hypothesis on KKK.

By the spectral theorem and its positive corollary, a symmetric positive kernel gives an orthonormal sequence of eigenfunctions φn∈L2\varphi_n\in L^2φn​∈L2 with eigenvalues λn≥0\lambda_n\ge 0λn​≥0 tending to 000 and TKφn=λnφnT_K\varphi_n=\lambda_n\varphi_nTK​φn​=λn​φn​.

#Continuity of the eigenfunctions

The eigenfunctions are a priori only L2L^2L2 classes. For nonzero eigenvalues they have continuous representatives, because the operator smooths.

Lemma3

The operator TKT_KTK​ maps L2L^2L2 into the continuous functions, and every eigenfunction with λn>0\lambda_n>0λn​>0 has a continuous representative, namely φn=λn−1TKφn\varphi_n=\lambda_n^{-1}T_K\varphi_nφn​=λn−1​TK​φn​.

Proof

The kernel is uniformly continuous on the compact square, so for ε>0\varepsilon>0ε>0 there is δ>0\delta>0δ>0 with ∣K(s,t)−K(s′,t)∣<ε\abs{K(s,t)-K(s',t)}<\varepsilon∣K(s,t)−K(s′,t)∣<ε whenever ∣s−s′∣<δ\abs{s-s'}<\delta∣s−s′∣<δ, uniformly in ttt. Then by Cauchy-Schwarz

∣(TKf)(s)−(TKf)(s′)∣≤∫ab∣K(s,t)−K(s′,t)∣∣f(t)∣ dt≤εb−a ∥f∥2,(2)\abs{(T_K f)(s)-(T_K f)(s')}\le\int_a^b\abs{K(s,t)-K(s',t)}\abs{f(t)}\,dt\le\varepsilon\sqrt{b-a}\, \norm f_2, \tag{2}∣(TK​f)(s)−(TK​f)(s′)∣≤∫ab​∣K(s,t)−K(s′,t)∣∣f(t)∣dt≤εb−a​∥f∥2​,(2)

so TKfT_K fTK​f is continuous. Since λnφn=TKφn\lambda_n\varphi_n=T_K\varphi_nλn​φn​=TK​φn​ lies in the range of TKT_KTK​, the function λn−1TKφn\lambda_n^{-1}T_K\varphi_nλn−1​TK​φn​ is a continuous representative of φn\varphi_nφn​ when λn>0\lambda_n>0λn​>0.

From here φn\varphi_nφn​ denotes the continuous representative, and K(s,s)K(s,s)K(s,s), φn(s)\varphi_n(s)φn​(s) are genuine pointwise values.

#Mercer's theorem

Theorem4

Let KKK be a continuous symmetric positive kernel on [a,b]2[a,b]^2[a,b]2. Then

K(s,t)=∑nλnφn(s)φn(t),(3)K(s,t)=\sum_{n}\lambda_n\varphi_n(s)\varphi_n(t), \tag{3}K(s,t)=n∑​λn​φn​(s)φn​(t),(3)

the series converging absolutely for each (s,t)(s,t)(s,t) and uniformly on [a,b]2[a,b]^2[a,b]2.

Proof

Write SN(s,t)=∑n≤Nλnφn(s)φn(t)S_N(s,t)=\sum_{n\le N}\lambda_n\varphi_n(s)\varphi_n(t)SN​(s,t)=∑n≤N​λn​φn​(s)φn​(t) for the partial sums.

Convergence in L2L^2L2. By the spectral theorem the closure of the range of TKT_KTK​ is the closed span of {φn:λn>0}\{\varphi_n:\lambda_n>0\}{φn​:λn​>0}, and self-adjointness gives (ran⁡TK‾)⊥=ker⁡TK(\overline{\operatorname{ran}T_K})^\perp=\ker T_K(ranTK​​)⊥=kerTK​, so any direction orthogonal to all φn\varphi_nφn​ lies in ker⁡TK\ker T_KkerTK​. Extend {φn:λn>0}\{\varphi_n:\lambda_n>0\}{φn​:λn​>0} by an orthonormal basis of ker⁡TK\ker T_KkerTK​ to a complete orthonormal basis (ui)(u_i)(ui​) of L2L^2L2; then (ui(s)uj(t))(u_i(s)u_j(t))(ui​(s)uj​(t)) is a complete orthonormal basis of L2([a,b]2)L^2([a,b]^2)L2([a,b]2). The coefficient ⟨K,ui⊗uj⟩=⟨TKuj,ui⟩\ip{K}{u_i\otimes u_j}=\ip{T_K u_j}{u_i}⟨K,ui​⊗uj​⟩=⟨TK​uj​,ui​⟩ vanishes unless uj=φnu_j=\varphi_nuj​=φn​ with λn>0\lambda_n>0λn​>0, in which case it equals λnδij\lambda_n\delta_{ij}λn​δij​. Hence K=∑n:λn>0λnφn⊗φnK=\sum_{n:\lambda_n>0}\lambda_n\varphi_n\otimes\varphi_nK=∑n:λn​>0​λn​φn​⊗φn​ in L2([a,b]2)L^2([a,b]^2)L2([a,b]2), so SN→KS_N\to KSN​→K in L2L^2L2.

The diagonal is bounded. The remainder K−SNK-S_NK−SN​ has integral operator TK−∑n≤Nλnφn⊗φnT_K-\sum_{n\le N}\lambda_n \varphi_n\otimes\varphi_nTK​−∑n≤N​λn​φn​⊗φn​, which is positive because ⟨(TK−∑n≤Nλnφn⊗φn)f,f⟩=∑n>Nλn⟨f,φn⟩2≥0\ip{(T_K-\sum_{n\le N}\lambda_n\varphi_n\otimes \varphi_n)f}{f}=\sum_{n>N}\lambda_n\ip{f}{\varphi_n}^2\ge 0⟨(TK​−∑n≤N​λn​φn​⊗φn​)f,f⟩=∑n>N​λn​⟨f,φn​⟩2≥0. A continuous positive kernel is nonnegative on the diagonal, since a strictly negative value KN(s0,s0)<0K_N(s_0,s_0)<0KN​(s0​,s0​)<0 would persist on a square [s0−η,s0+η]2[s_0-\eta,s_0+\eta]^2[s0​−η,s0​+η]2 by continuity and, with g=1[s0−η,s0+η]g=\mathbf 1_{[s_0-\eta,s_0+\eta]}g=1[s0​−η,s0​+η]​, make ⟨TKNg,g⟩=∫ ⁣ ⁣∫[s0−η,s0+η]2KN(s,t) ds dt<0\ip{T_{K_N}g}{g}=\int\!\!\int_{[s_0-\eta,s_0+\eta]^2}K_N(s,t)\,ds\,dt<0⟨TKN​​g,g⟩=∫∫[s0​−η,s0​+η]2​KN​(s,t)dsdt<0. Thus ∑n≤Nλnφn(s)2≤K(s,s)\sum_{n\le N}\lambda_n\varphi_n(s)^2\le K(s,s)∑n≤N​λn​φn​(s)2≤K(s,s) for all sss and NNN, and the increasing sums converge pointwise to a limit g(s)≤K(s,s)g(s)\le K(s,s)g(s)≤K(s,s).

Convergence in ttt for fixed sss. For m>Nm>Nm>N, Cauchy-Schwarz on the increment gives

∣Sm(s,t)−SN(s,t)∣≤(∑N<n≤mλnφn(s)2)1/2(∑N<n≤mλnφn(t)2)1/2≤(g(s)−SNd(s))1/2K(t,t)1/2,(4)\abs{S_m(s,t)-S_N(s,t)}\le\Big(\sum_{N<n\le m}\lambda_n\varphi_n(s)^2\Big)^{1/2}\Big(\sum_{N<n\le m} \lambda_n\varphi_n(t)^2\Big)^{1/2}\le\big(g(s)-S_N^{\mathrm d}(s)\big)^{1/2}K(t,t)^{1/2}, \tag{4}∣Sm​(s,t)−SN​(s,t)∣≤(N<n≤m∑​λn​φn​(s)2)1/2(N<n≤m∑​λn​φn​(t)2)1/2≤(g(s)−SNd​(s))1/2K(t,t)1/2,(4)

where SNd(s)=∑n≤Nλnφn(s)2S_N^{\mathrm d}(s)=\sum_{n\le N}\lambda_n\varphi_n(s)^2SNd​(s)=∑n≤N​λn​φn​(s)2 and the second factor uses the diagonal bound at ttt. Since SNd(s)→g(s)S_N^{\mathrm d}(s)\to g(s)SNd​(s)→g(s), the sums SN(s,⋅)S_N(s,\cdot)SN​(s,⋅) are uniformly Cauchy in ttt, converging uniformly to a continuous function FsF_sFs​ with Fs(t)=∑nλnφn(s)φn(t)F_s(t)=\sum_n\lambda_n\varphi_n(s)\varphi_n(t)Fs​(t)=∑n​λn​φn​(s)φn​(t).

A pointwise inequality. The remainder KN=K−SNK_N=K-S_NKN​=K−SN​ is a continuous kernel with positive operator, so its values obey KN(s,t)2≤KN(s,s)KN(t,t)K_N(s,t)^2\le K_N(s,s)K_N(t,t)KN​(s,t)2≤KN​(s,s)KN​(t,t) for all s,ts,ts,t. To see this, evaluate the positivity ⟨TKNf,f⟩≥0\ip{T_{K_N}f}{f}\ge 0⟨TKN​​f,f⟩≥0 on f=α1I/∣I∣+β1J/∣J∣f=\alpha\mathbf 1_I/\abs I+\beta\mathbf 1_J/\abs Jf=α1I​/∣I∣+β1J​/∣J∣ for intervals I∋sI\ni sI∋s, J∋tJ\ni tJ∋t shrinking to their centres. Continuity sends the quadratic form to α2KN(s,s)+2αβKN(s,t)+β2KN(t,t)≥0\alpha^2 K_N(s,s)+2\alpha \beta K_N(s,t)+\beta^2 K_N(t,t)\ge 0α2KN​(s,s)+2αβKN​(s,t)+β2KN​(t,t)≥0 for all real α,β\alpha,\betaα,β, and a positive semidefinite binary form has nonnegative discriminant, which is the inequality.

Identification. For each sss the uniform limit gives KN(s,⋅)=K(s,⋅)−SN(s,⋅)→K(s,⋅)−Fs=:hsK_N(s,\cdot)=K(s,\cdot)-S_N(s,\cdot)\to K(s,\cdot) -F_s=:h_sKN​(s,⋅)=K(s,⋅)−SN​(s,⋅)→K(s,⋅)−Fs​=:hs​ uniformly in ttt, with hsh_shs​ continuous. Separately, SN→KS_N\to KSN​→K in L2([a,b]2)L^2([a,b]^2)L2([a,b]2) forces, along a subsequence, SN(s,⋅)→K(s,⋅)S_N(s,\cdot)\to K(s,\cdot)SN​(s,⋅)→K(s,⋅) in L2(t)L^2(t)L2(t) for almost every sss, so hs=0h_s=0hs​=0 on a set GGG of full measure, which is dense. For s′∈Gs'\in Gs′∈G the a.e.-ttt equality Fs′=K(s′,⋅)F_{s'}=K(s',\cdot)Fs′​=K(s′,⋅) between continuous functions holds everywhere, so at t=s′t=s't=s′ we get g(s′)=Fs′(s′)=K(s′,s′)g(s')=F_{s'}(s')=K(s',s')g(s′)=Fs′​(s′)=K(s′,s′), whence the diagonal KN(s′,s′)=K(s′,s′)−SNd(s′)→K(s′,s′)−g(s′)=0K_N(s',s')=K(s',s')-S_N^{\mathrm d}(s')\to K(s',s')-g(s')=0KN​(s′,s′)=K(s′,s′)−SNd​(s′)→K(s′,s′)−g(s′)=0. Fixing any s0s_0s0​ and any s′∈Gs'\in Gs′∈G, the diagonal bound 0≤KN(s0,s0)≤K(s0,s0)0\le K_N(s_0,s_0)\le K(s_0,s_0)0≤KN​(s0​,s0​)≤K(s0​,s0​) keeps the first factor bounded, so the pointwise inequality gives KN(s0,s′)2≤KN(s0,s0)KN(s′,s′)≤K(s0,s0)KN(s′,s′)→0K_N(s_0,s')^2\le K_N(s_0,s_0)K_N(s',s')\le K(s_0,s_0)K_N(s',s')\to 0KN​(s0​,s′)2≤KN​(s0​,s0​)KN​(s′,s′)≤K(s0​,s0​)KN​(s′,s′)→0, so hs0(s′)=0h_{s_0}(s')=0hs0​​(s′)=0. The continuous hs0h_{s_0}hs0​​ thus vanishes on the dense GGG, hence everywhere, giving K(s0,⋅)=Fs0K(s_0,\cdot)=F_{s_0}K(s0​,⋅)=Fs0​​ for every s0s_0s0​. So K(s,t)=∑nλnφn(s)φn(t)K(s,t)=\sum_n\lambda_n\varphi_n(s)\varphi_n(t)K(s,t)=∑n​λn​φn​(s)φn​(t) for all (s,t)(s,t)(s,t), and in particular g(s)=K(s,s)g(s)=K(s,s)g(s)=K(s,s).

Uniformity. The continuous increasing sums SNd(s)S_N^{\mathrm d}(s)SNd​(s) now converge to the continuous limit K(s,s)K(s,s)K(s,s) on the compact interval [a,b][a,b][a,b], so by Dini's theorem the convergence is uniform in sss. The bound Equation (4) then has its first factor uniformly small and its second bounded by max⁡tK(t,t)1/2\max_t K(t,t)^{1/2}maxt​K(t,t)1/2, making SN→KS_N\to KSN​→K uniformly on [a,b]2[a,b]^2[a,b]2. The same bound with absolute values term by term gives absolute convergence at each point.

The diagonal of Equation (3) integrates to the trace identity.

Corollary5

∫abK(s,s) ds=∑nλn\int_a^b K(s,s)\,ds=\sum_n\lambda_n∫ab​K(s,s)ds=∑n​λn​.

Proof

Integrate K(s,s)=∑nλnφn(s)2K(s,s)=\sum_n\lambda_n\varphi_n(s)^2K(s,s)=∑n​λn​φn​(s)2 over [a,b][a,b][a,b]. Uniform convergence permits term-by-term integration, and ∫φn2=1\int\varphi_n^2=1∫φn2​=1 leaves ∑nλn\sum_n\lambda_n∑n​λn​.

#Reproducing kernels

Mercer's eigen-data assembles a Hilbert space of functions in which KKK does the evaluating.

Definition6

The reproducing kernel Hilbert space of a positive kernel KKK is

HK={f=∑n:λn>0anφn: ∥f∥HK2=∑n:λn>0an2λn<∞},(5)\mathcal H_K=\Big\{f=\sum_{n:\lambda_n>0} a_n\varphi_n:\ \norm f_{\mathcal H_K}^2=\sum_{n:\lambda_n>0} \frac{a_n^2}{\lambda_n}<\infty\Big\}, \tag{5}HK​={f=n:λn​>0∑​an​φn​: ∥f∥HK​2​=n:λn​>0∑​λn​an2​​<∞},(5)

with inner product ⟨f,g⟩HK=∑n:λn>0anbn/λn\ip{f}{g}_{\mathcal H_K}=\sum_{n:\lambda_n>0} a_n b_n/\lambda_n⟨f,g⟩HK​​=∑n:λn​>0​an​bn​/λn​. The sum runs over the nonzero-eigenvalue modes, so the norm is definite.

Writing Ks=K(s,⋅)=∑nλnφn(s)φnK_s=K(s,\cdot)=\sum_n\lambda_n\varphi_n(s)\varphi_nKs​=K(s,⋅)=∑n​λn​φn​(s)φn​ from Mercer, its coordinates are an=λnφn(s)a_n=\lambda_n\varphi_n(s)an​=λn​φn​(s), so ∥Ks∥HK2=∑nλnφn(s)2=K(s,s)<∞\norm{K_s}_{\mathcal H_K}^2=\sum_n\lambda_n\varphi_n(s)^2=K(s,s)<\infty∥Ks​∥HK​2​=∑n​λn​φn​(s)2=K(s,s)<∞, putting every KsK_sKs​ in HK\mathcal H_KHK​. The defining property follows by matching coordinates.

Proposition7

For every f=∑n:λn>0anφn∈HKf=\sum_{n:\lambda_n>0} a_n\varphi_n\in\mathcal H_Kf=∑n:λn​>0​an​φn​∈HK​ and every sss, the reproducing property f(s)=⟨f,Ks⟩HKf(s)=\ip{f}{K_s}_{\mathcal H_K}f(s)=⟨f,Ks​⟩HK​​ holds.

Proof

The coordinates of KsK_sKs​ are λnφn(s)\lambda_n\varphi_n(s)λn​φn​(s), so ⟨f,Ks⟩HK=∑n:λn>0an⋅λnφn(s)/λn=∑n:λn>0anφn(s)=f(s)\ip{f}{K_s}_{\mathcal H_K}=\sum_{n:\lambda_n>0} a_n\cdot\lambda_n\varphi_n(s)/\lambda_n=\sum_{n:\lambda_n>0} a_n\varphi_n(s)=f(s)⟨f,Ks​⟩HK​​=∑n:λn​>0​an​⋅λn​φn​(s)/λn​=∑n:λn​>0​an​φn​(s)=f(s), the last equality being the expansion of fff evaluated at sss, which converges absolutely since ∑n:λn>0∣anφn(s)∣≤∥f∥HKK(s,s)1/2\sum_{n:\lambda_n>0}\abs{a_n \varphi_n(s)}\le\norm f_{\mathcal H_K}K(s,s)^{1/2}∑n:λn​>0​∣an​φn​(s)∣≤∥f∥HK​​K(s,s)1/2 by Cauchy-Schwarz.

The map s↦Kss\mapsto K_ss↦Ks​ is the feature map, embedding the index set into HK\mathcal H_KHK​ so that the kernel is the inner product of features, ⟨Ks,Kt⟩HK=K(s,t)\ip{K_s}{K_t}_{\mathcal H_K}=K(s,t)⟨Ks​,Kt​⟩HK​​=K(s,t), the identity at the root of every kernel method. Mercer's theorem is what makes this embedding concrete, turning the abstract spectral decomposition of an integral operator into an explicit basis of continuous features. Applied to a covariance kernel, the eigenfunctions φn\varphi_nφn​ are the principal modes of the process and the eigenvalues λn\lambda_nλn​ their variances, and the uniform series Equation (3) is precisely the covariance side of the Karhunen-Loeve expansion, the representation in which a process becomes a sum of independent coordinates.

Part 7 of 7 in Hilbert Spaces and Operators

← previousCompact Operators and the Spectral Theorem

Explore connections

see in the atlas →

related

  • The Karhunen-Loeve Expansion
  • Orthonormal Bases
  • Inner Product Spaces

referenced by (2)

  • Gaussian Vectors and Processes
  • Second-Order Processes and Mean-Square Calculus
cite
@misc{mercer-and-rkhs,
  author = {Zac Kienzle},
  title  = {Mercer's Theorem and Reproducing Kernels},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/mercer-and-rkhs}
}