Skip to content
homeaboutworkprojectsthesiswritingresume
Loading
~/blog/convergence-and-limit-theorems0%dark
  1. home/
  2. writing/
  3. Convergence and Limit Theorems

06 June 2026 · 5 min read · updated 09 June 2026

Convergence and Limit Theorems

We order the modes of convergence of random variables, prove the Markov and Chebyshev inequalities and the first Borel-Cantelli lemma, then prove a strong law of large numbers under a fourth-moment hypothesis by summability of fourth moments. The central limit theorem follows from a characteristic-function expansion together with the Levy continuity theorem.

  • 4 equations
  • 9 results
  • 7 connections
  • probability
  • measure-theory
On this page▾
  • Modes of convergence
  • Tail bounds
  • The strong law under a fourth moment
  • The central limit theorem
  • Illustration

5 min left

  • Modes of convergence1m
  • Tail bounds1m
  • The strong law under a fourth moment1m
  • The central limit theorem1m
  • Illustration1m

Sample averages obey two laws of a different character. One is almost sure and fixes the limit at the mean, and one is in distribution and fixes the fluctuations as Gaussian. Both rest on tail bounds that convert a moment into a probability.

#Modes of convergence

Definition1

A sequence of random variables XnX_nXn​ converges to XXX

(i)almost surely, if P ⁣(Xn→X)=1;(ii)in probability, if P ⁣(∣Xn−X∣>ε)→0 for each ε>0;(iii)in Lp, if E[∣Xn−X∣p]→0;(iv)in distribution, if E[g(Xn)]→E[g(X)] for every bounded continuous g.(1)\begin{aligned} &\text{(i)} &&\text{almost surely, if } \P\!\big(X_n\to X\big)=1; \\[2pt] &\text{(ii)} &&\text{in probability, if } \P\!\big(\abs{X_n-X}>\varepsilon\big)\to 0\ \text{for each }\varepsilon>0; \\[2pt] &\text{(iii)} &&\text{in } L^p, \text{ if } \E\big[\abs{X_n-X}^p\big]\to 0; \\[2pt] &\text{(iv)} &&\text{in distribution, if } \E[g(X_n)]\to\E[g(X)]\ \text{for every bounded continuous }g. \end{aligned} \tag{1}​(i)(ii)(iii)(iv)​​almost surely, if P(Xn​→X)=1;in probability, if P(∣Xn​−X∣>ε)→0 for each ε>0;in Lp, if E[∣Xn​−X∣p]→0;in distribution, if E[g(Xn​)]→E[g(X)] for every bounded continuous g.​(1)

Convergence almost surely and convergence in LpL^pLp each imply convergence in probability, which in turn implies convergence in distribution; no other implication holds in general [1].

#Tail bounds

Proposition2

(Markov and Chebyshev.) For a nonnegative random variable ZZZ and a>0a>0a>0, P(Z≥a)≤E[Z]/a\P(Z\ge a)\le\E[Z]/aP(Z≥a)≤E[Z]/a. Consequently, for any X∈L2X\in L^2X∈L2 and ε>0\varepsilon>0ε>0,

P ⁣(∣X−EX∣≥ε)≤Var⁡(X)ε2.(2)\P\!\big(\abs{X-\E X}\ge\varepsilon\big)\le\frac{\Var(X)}{\varepsilon^2}. \tag{2}P(∣X−EX∣≥ε)≤ε2Var(X)​.(2)
Proof

Since a 1{Z≥a}≤Za\,\mathbf 1\{Z\ge a\}\le Za1{Z≥a}≤Z pointwise, taking expectations gives a P(Z≥a)≤E[Z]a\,\P(Z\ge a)\le\E[Z]aP(Z≥a)≤E[Z]. Applying this to Z=(X−EX)2Z=(X-\E X)^2Z=(X−EX)2 and a=ε2a=\varepsilon^2a=ε2 gives P(∣X−EX∣≥ε)=P((X−EX)2≥ε2)≤E[(X−EX)2]/ε2\P(\abs{X-\E X}\ge\varepsilon)=\P((X-\E X)^2\ge\varepsilon^2)\le\E[(X-\E X)^2]/\varepsilon^2P(∣X−EX∣≥ε)=P((X−EX)2≥ε2)≤E[(X−EX)2]/ε2, which is Equation (2).

Lemma3

(First Borel-Cantelli.) If ∑nP(An)<∞\sum_n\P(A_n)<\infty∑n​P(An​)<∞, then P(An infinitely often)=0\P(A_n\text{ infinitely often})=0P(An​ infinitely often)=0.

Proof

Let N=∑n1AnN=\sum_n\mathbf 1_{A_n}N=∑n​1An​​. Monotone convergence gives E[N]=∑nP(An)<∞\E[N]=\sum_n\P(A_n)<\inftyE[N]=∑n​P(An​)<∞, so N<∞N<\inftyN<∞ almost surely, which is exactly that only finitely many AnA_nAn​ occur.

#The strong law under a fourth moment

Theorem4

Let X1,X2,…X_1,X_2,\dotsX1​,X2​,… be i.i.d. with E[X1]=μ\E[X_1]=\muE[X1​]=μ and E[X14]<∞\E[X_1^4]<\inftyE[X14​]<∞. Then Sn/n→μS_n/n\to\muSn​/n→μ almost surely, where Sn=∑i=1nXiS_n=\sum_{i=1}^n X_iSn​=∑i=1n​Xi​.

Proof

Since E[X14]<∞\E[X_1^4]<\inftyE[X14​]<∞, power-mean monotonicity gives E[∣X1∣k]<∞\E[\abs{X_1}^k]<\inftyE[∣X1​∣k]<∞ for all k≤4k\le 4k≤4; in particular E[X12]<∞\E[X_1^2]<\inftyE[X12​]<∞, so (E[X12])2<∞\big(\E[X_1^2]\big)^2<\infty(E[X12​])2<∞. Replacing XiX_iXi​ by Xi−μX_i-\muXi​−μ, assume μ=0\mu=0μ=0; the centered fourth moment stays finite, since E∣X1−μ∣4≤8(E∣X1∣4+μ4)<∞\E\abs{X_1-\mu}^4\le 8\big(\E\abs{X_1}^4+\mu^4\big)<\inftyE∣X1​−μ∣4≤8(E∣X1​∣4+μ4)<∞ by the crc_rcr​-inequality. Expanding Sn4=∑i,j,k,lXiXjXkXlS_n^4=\sum_{i,j,k,l}X_iX_jX_kX_lSn4​=∑i,j,k,l​Xi​Xj​Xk​Xl​ and taking expectations, independence and E[Xi]=0\E[X_i]=0E[Xi​]=0 annihilate every term with an index appearing exactly once. Only the nnn terms E[Xi4]\E[X_i^4]E[Xi4​] and the 3n(n−1)3n(n-1)3n(n−1) terms E[Xi2]E[Xj2]\E[X_i^2]\E[X_j^2]E[Xi2​]E[Xj2​] with i≠ji\ne ji=j survive, so

E[Sn4]=n E[X14]+3n(n−1) (E[X12])2≤Cn2(3)\E[S_n^4]=n\,\E[X_1^4]+3n(n-1)\,\big(\E[X_1^2]\big)^2\le C n^2 \tag{3}E[Sn4​]=nE[X14​]+3n(n−1)(E[X12​])2≤Cn2(3)

for a constant CCC. Hence E[(Sn/n)4]≤C/n2\E[(S_n/n)^4]\le C/n^2E[(Sn​/n)4]≤C/n2. Let TN=∑n≤N(Sn/n)4T_N=\sum_{n\le N}(S_n/n)^4TN​=∑n≤N​(Sn​/n)4; the TNT_NTN​ are nonnegative and increase to T=∑n(Sn/n)4T=\sum_n(S_n/n)^4T=∑n​(Sn​/n)4, so monotone convergence gives E[T]=lim⁡N∑n≤NE[(Sn/n)4]=∑nE[(Sn/n)4]≤C∑nn−2<∞\E[T]=\lim_N\sum_{n\le N}\E[(S_n/n)^4]=\sum_n\E[(S_n/n)^4]\le C\sum_n n^{-2}<\inftyE[T]=limN​∑n≤N​E[(Sn​/n)4]=∑n​E[(Sn​/n)4]≤C∑n​n−2<∞. A random variable with finite expectation is finite almost surely, hence T=∑n(Sn/n)4<∞T=\sum_n(S_n/n)^4<\inftyT=∑n​(Sn​/n)4<∞ almost surely. The terms of a convergent series vanish, giving Sn/n→0S_n/n\to 0Sn​/n→0 almost surely.

#The central limit theorem

Write φX(t)=E[eitX]\varphi_X(t)=\E[e^{itX}]φX​(t)=E[eitX] for the characteristic function. It determines the law, is uniformly continuous, and factorizes over independent sums, with φX+Y=φXφY\varphi_{X+Y}=\varphi_X\varphi_YφX+Y​=φX​φY​ for independent X,YX,YX,Y.

Theorem5

Let X1,X2,…X_1,X_2,\dotsX1​,X2​,… be i.i.d. with E[X1]=0\E[X_1]=0E[X1​]=0 and Var⁡(X1)=σ2∈(0,∞)\Var(X_1)=\sigma^2\in(0,\infty)Var(X1​)=σ2∈(0,∞). Then Sn/(σn)S_n/(\sigma\sqrt n)Sn​/(σn​) converges in distribution to the standard normal.

Proof

Normalize to σ=1\sigma=1σ=1 and set φ=φX1\varphi=\varphi_{X_1}φ=φX1​​. From the bound ∣eisx−(1+isx−12s2x2)∣≤min⁡(∣sx∣3,∣sx∣2)\abs{e^{isx}-(1+isx-\tfrac12 s^2x^2)}\le\min(\abs{sx}^3,\abs{sx}^2)​eisx−(1+isx−21​s2x2)​≤min(∣sx∣3,∣sx∣2), taking expectations gives ∣φ(s)−(1+isEX1−12s2EX12)∣≤E[min⁡(∣sX1∣3,∣sX1∣2)]=o(s2)\abs{\varphi(s)-(1+is\E X_1-\tfrac12 s^2\E X_1^2)}\le\E[\min(\abs{sX_1}^3,\abs{sX_1}^2)]=o(s^2)​φ(s)−(1+isEX1​−21​s2EX12​)​≤E[min(∣sX1​∣3,∣sX1​∣2)]=o(s2), the last step by dominated convergence (the integrand divided by s2s^2s2 is bounded by X12∈L1X_1^2\in L^1X12​∈L1 and tends to 000 pointwise). With EX1=0\E X_1=0EX1​=0 and EX12=1\E X_1^2=1EX12​=1 this is the expansion φ(s)=1−12s2+o(s2)\varphi(s)=1-\tfrac12 s^2+o(s^2)φ(s)=1−21​s2+o(s2) as s→0s\to 0s→0, requiring only ∣X1∣2∈L1\abs{X_1}^2\in L^1∣X1​∣2∈L1. Independence and identical distribution give

φSn/n(t)=φ ⁣(t/n)n=(1−t22n+o ⁣(n−1))n,(4)\varphi_{S_n/\sqrt n}(t)=\varphi\!\big(t/\sqrt n\big)^n=\Big(1-\frac{t^2}{2n}+o\!\big(n^{-1}\big)\Big)^n, \tag{4}φSn​/n​​(t)=φ(t/n​)n=(1−2nt2​+o(n−1))n,(4)

and for complex zn→zz_n\to zzn​→z one has (1+zn/n)n→ez(1+z_n/n)^n\to e^{z}(1+zn​/n)n→ez, since nlog⁡(1+zn/n)=zn+O(∣zn∣2/n)→zn\log(1+z_n/n)=z_n+O(\abs{z_n}^2/n)\to znlog(1+zn​/n)=zn​+O(∣zn​∣2/n)→z on the principal branch once ∣zn∣/n<1\abs{z_n}/n<1∣zn​∣/n<1. With zn=−t2/2+o(1)→−t2/2z_n=-t^2/2+o(1)\to -t^2/2zn​=−t2/2+o(1)→−t2/2 this yields φSn/n(t)→e−t2/2\varphi_{S_n/\sqrt n}(t)\to e^{-t^2/2}φSn​/n​​(t)→e−t2/2 for every ttt. The limit is the characteristic function of the standard normal and is continuous at the origin, so the Levy continuity theorem upgrades pointwise convergence of characteristic functions to convergence in distribution [1], [2].

#Illustration

The theorem is indifferent to the summand law. Standardized means of skewed Exponential(1)\mathrm{Exponential}(1)Exponential(1) increments approach the standard normal, and the code below measures the residual excess kurtosis as a finite-nnn diagnostic.

import numpy as np
from numpy.random import Generator


def standardized_means(n: int, paths: int, rng: Generator) -> np.ndarray:
    """Standardized sample means of i.i.d. Exponential(1) increments.

    Args:
        n: Number of summands per trajectory.
        paths: Number of independent trajectories.
        rng: Seeded generator for reproducibility.

    Returns:
        The array sqrt(n) * (mean - 1) per trajectory, converging in
        distribution to the standard normal as n grows.
    """
    samples = rng.exponential(1.0, size=(paths, n))
    return np.sqrt(n) * (samples.mean(axis=1) - 1.0)


rng = np.random.default_rng(0)
z = standardized_means(n=2_000, paths=200_000, rng=rng)
excess_kurtosis = float(((z - z.mean()) ** 4).mean() / z.var() ** 2 - 3.0)

The Chebyshev bound Equation (2) and the strong law Theorem 4 pin the average to its mean, while Theorem 5 resolves the residual fluctuation at the scale n\sqrt nn​.

[1]
R. Durrett, Probability: Theory and Examples, 5th ed. Cambridge University Press, 2019.
[2]
D. Williams, Probability with Martingales. Cambridge University Press, 1991.

Part 5 of 9 in Probability

← previousGaussian Vectors and Processesnext →Conditional Expectation

Explore connections

see in the atlas →

related

  • Independence
  • Characteristic Functions
  • Probability Spaces and Random Variables

referenced by (2)

  • Characteristic Functions
  • Independence
cite
@misc{convergence-and-limit-theorems,
  author = {Zac Kienzle},
  title  = {Convergence and Limit Theorems},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/convergence-and-limit-theorems}
}