Convergence and Limit Theorems

Sample averages obey two laws of a different character. One is almost sure and fixes the limit at the mean, and one is in distribution and fixes the fluctuations as Gaussian. Both rest on tail bounds that convert a moment into a probability.

#Modes of convergence

Definition1

A sequence of random variables $X_n$ converges to $X$

\begin{aligned} &\text{(i)} &&\text{almost surely, if } \P\!\big(X_n\to X\big)=1; \\[2pt] &\text{(ii)} &&\text{in probability, if } \P\!\big(\abs{X_n-X}>\varepsilon\big)\to 0\ \text{for each }\varepsilon>0; \\[2pt] &\text{(iii)} &&\text{in } L^p, \text{ if } \E\big[\abs{X_n-X}^p\big]\to 0; \\[2pt] &\text{(iv)} &&\text{in distribution, if } \E[g(X_n)]\to\E[g(X)]\ \text{for every bounded continuous }g. \end{aligned} \tag{1}

Convergence almost surely and convergence in $L^p$ each imply convergence in probability, which in turn implies convergence in distribution; no other implication holds in general [1].

#Tail bounds

Proposition2

(Markov and Chebyshev.) For a nonnegative random variable $Z$ and $a>0$ , $\P(Z\ge a)\le\E[Z]/a$ . Consequently, for any $X\in L^2$ and $\varepsilon>0$ ,

\P\!\big(\abs{X-\E X}\ge\varepsilon\big)\le\frac{\Var(X)}{\varepsilon^2}. \tag{2}

Proof

Since $a\,\mathbf 1\{Z\ge a\}\le Z$ pointwise, taking expectations gives $a\,\P(Z\ge a)\le\E[Z]$ . Applying this to $Z=(X-\E X)^2$ and $a=\varepsilon^2$ gives $\P(\abs{X-\E X}\ge\varepsilon)=\P((X-\E X)^2\ge\varepsilon^2)\le\E[(X-\E X)^2]/\varepsilon^2$ , which is Equation (2).

Lemma3

(First Borel-Cantelli.) If $\sum_n\P(A_n)<\infty$ , then $\P(A_n\text{ infinitely often})=0$ .

Proof

Let $N=\sum_n\mathbf 1_{A_n}$ . Monotone convergence gives $\E[N]=\sum_n\P(A_n)<\infty$ , so $N<\infty$ almost surely, which is exactly that only finitely many $A_n$ occur.

#The strong law under a fourth moment

Theorem4

Let $X_1,X_2,\dots$ be i.i.d. with $\E[X_1]=\mu$ and $\E[X_1^4]<\infty$ . Then $S_n/n\to\mu$ almost surely, where $S_n=\sum_{i=1}^n X_i$ .

Proof

Since $\E[X_1^4]<\infty$ , power-mean monotonicity gives $\E[\abs{X_1}^k]<\infty$ for all $k\le 4$ ; in particular $\E[X_1^2]<\infty$ , so $\big(\E[X_1^2]\big)^2<\infty$ . Replacing $X_i$ by $X_i-\mu$ , assume $\mu=0$ ; the centered fourth moment stays finite, since $\E\abs{X_1-\mu}^4\le 8\big(\E\abs{X_1}^4+\mu^4\big)<\infty$ by the $c_r$ -inequality. Expanding $S_n^4=\sum_{i,j,k,l}X_iX_jX_kX_l$ and taking expectations, independence and $\E[X_i]=0$ annihilate every term with an index appearing exactly once. Only the $n$ terms $\E[X_i^4]$ and the $3n(n-1)$ terms $\E[X_i^2]\E[X_j^2]$ with $i\ne j$ survive, so

\E[S_n^4]=n\,\E[X_1^4]+3n(n-1)\,\big(\E[X_1^2]\big)^2\le C n^2 \tag{3}

for a constant $C$ . Hence $\E[(S_n/n)^4]\le C/n^2$ . Let $T_N=\sum_{n\le N}(S_n/n)^4$ ; the $T_N$ are nonnegative and increase to $T=\sum_n(S_n/n)^4$ , so monotone convergence gives $\E[T]=\lim_N\sum_{n\le N}\E[(S_n/n)^4]=\sum_n\E[(S_n/n)^4]\le C\sum_n n^{-2}<\infty$ . A random variable with finite expectation is finite almost surely, hence $T=\sum_n(S_n/n)^4<\infty$ almost surely. The terms of a convergent series vanish, giving $S_n/n\to 0$ almost surely.

#The central limit theorem

Write $\varphi_X(t)=\E[e^{itX}]$ for the characteristic function. It determines the law, is uniformly continuous, and factorizes over independent sums, with $\varphi_{X+Y}=\varphi_X\varphi_Y$ for independent $X,Y$ .

Theorem5

Let $X_1,X_2,\dots$ be i.i.d. with $\E[X_1]=0$ and $\Var(X_1)=\sigma^2\in(0,\infty)$ . Then $S_n/(\sigma\sqrt n)$ converges in distribution to the standard normal.

Proof

Normalize to $\sigma=1$ and set $\varphi=\varphi_{X_1}$ . From the bound $\abs{e^{isx}-(1+isx-\tfrac12 s^2x^2)}\le\min(\abs{sx}^3,\abs{sx}^2)$ , taking expectations gives $\abs{\varphi(s)-(1+is\E X_1-\tfrac12 s^2\E X_1^2)}\le\E[\min(\abs{sX_1}^3,\abs{sX_1}^2)]=o(s^2)$ , the last step by dominated convergence (the integrand divided by $s^2$ is bounded by $X_1^2\in L^1$ and tends to $0$ pointwise). With $\E X_1=0$ and $\E X_1^2=1$ this is the expansion $\varphi(s)=1-\tfrac12 s^2+o(s^2)$ as $s\to 0$ , requiring only $\abs{X_1}^2\in L^1$ . Independence and identical distribution give

\varphi_{S_n/\sqrt n}(t)=\varphi\!\big(t/\sqrt n\big)^n=\Big(1-\frac{t^2}{2n}+o\!\big(n^{-1}\big)\Big)^n, \tag{4}

and for complex $z_n\to z$ one has $(1+z_n/n)^n\to e^{z}$ , since $n\log(1+z_n/n)=z_n+O(\abs{z_n}^2/n)\to z$ on the principal branch once $\abs{z_n}/n<1$ . With $z_n=-t^2/2+o(1)\to -t^2/2$ this yields $\varphi_{S_n/\sqrt n}(t)\to e^{-t^2/2}$ for every $t$ . The limit is the characteristic function of the standard normal and is continuous at the origin, so the Levy continuity theorem upgrades pointwise convergence of characteristic functions to convergence in distribution [1], [2].

#Illustration

The theorem is indifferent to the summand law. Standardized means of skewed $\mathrm{Exponential}(1)$ increments approach the standard normal, and the code below measures the residual excess kurtosis as a finite- $n$ diagnostic.

import numpy as np
from numpy.random import Generator


def standardized_means(n: int, paths: int, rng: Generator) -> np.ndarray:
    """Standardized sample means of i.i.d. Exponential(1) increments.

    Args:
        n: Number of summands per trajectory.
        paths: Number of independent trajectories.
        rng: Seeded generator for reproducibility.

    Returns:
        The array sqrt(n) * (mean - 1) per trajectory, converging in
        distribution to the standard normal as n grows.
    """
    samples = rng.exponential(1.0, size=(paths, n))
    return np.sqrt(n) * (samples.mean(axis=1) - 1.0)


rng = np.random.default_rng(0)
z = standardized_means(n=2_000, paths=200_000, rng=rng)
excess_kurtosis = float(((z - z.mean()) ** 4).mean() / z.var() ** 2 - 3.0)

The Chebyshev bound Equation (2) and the strong law Theorem 4 pin the average to its mean, while Theorem 5 resolves the residual fluctuation at the scale $\sqrt n$ .

[1]

R. Durrett, Probability: Theory and Examples, 5th ed. Cambridge University Press, 2019.

[2]

D. Williams, Probability with Martingales. Cambridge University Press, 1991.

Explore connections

see in the atlas

referenced by (2)

cite

@misc{convergence-and-limit-theorems,
  author = {Zac Kienzle},
  title  = {Convergence and Limit Theorems},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/convergence-and-limit-theorems}
}

#Modes of convergence

Definition1

A sequence of random variables $X_n$ converges to $X$

\begin{aligned} &\text{(i)} &&\text{almost surely, if } \P\!\big(X_n\to X\big)=1; \\[2pt] &\text{(ii)} &&\text{in probability, if } \P\!\big(\abs{X_n-X}>\varepsilon\big)\to 0\ \text{for each }\varepsilon>0; \\[2pt] &\text{(iii)} &&\text{in } L^p, \text{ if } \E\big[\abs{X_n-X}^p\big]\to 0; \\[2pt] &\text{(iv)} &&\text{in distribution, if } \E[g(X_n)]\to\E[g(X)]\ \text{for every bounded continuous }g. \end{aligned} \tag{1}

Convergence almost surely and convergence in $L^p$ each imply convergence in probability, which in turn implies convergence in distribution; no other implication holds in general [1].

#Tail bounds

Proposition2

(Markov and Chebyshev.) For a nonnegative random variable $Z$ and $a>0$ , $\P(Z\ge a)\le\E[Z]/a$ . Consequently, for any $X\in L^2$ and $\varepsilon>0$ ,

\P\!\big(\abs{X-\E X}\ge\varepsilon\big)\le\frac{\Var(X)}{\varepsilon^2}. \tag{2}

Proof

Lemma3

(First Borel-Cantelli.) If $\sum_n\P(A_n)<\infty$ , then $\P(A_n\text{ infinitely often})=0$ .

Proof

Let $N=\sum_n\mathbf 1_{A_n}$ . Monotone convergence gives $\E[N]=\sum_n\P(A_n)<\infty$ , so $N<\infty$ almost surely, which is exactly that only finitely many $A_n$ occur.

#The strong law under a fourth moment

Theorem4

Let $X_1,X_2,\dots$ be i.i.d. with $\E[X_1]=\mu$ and $\E[X_1^4]<\infty$ . Then $S_n/n\to\mu$ almost surely, where $S_n=\sum_{i=1}^n X_i$ .

Proof

\E[S_n^4]=n\,\E[X_1^4]+3n(n-1)\,\big(\E[X_1^2]\big)^2\le C n^2 \tag{3}

#The central limit theorem

Theorem5

Let $X_1,X_2,\dots$ be i.i.d. with $\E[X_1]=0$ and $\Var(X_1)=\sigma^2\in(0,\infty)$ . Then $S_n/(\sigma\sqrt n)$ converges in distribution to the standard normal.

Proof

\varphi_{S_n/\sqrt n}(t)=\varphi\!\big(t/\sqrt n\big)^n=\Big(1-\frac{t^2}{2n}+o\!\big(n^{-1}\big)\Big)^n, \tag{4}

#Illustration

import numpy as np
from numpy.random import Generator


def standardized_means(n: int, paths: int, rng: Generator) -> np.ndarray:
    """Standardized sample means of i.i.d. Exponential(1) increments.

    Args:
        n: Number of summands per trajectory.
        paths: Number of independent trajectories.
        rng: Seeded generator for reproducibility.

    Returns:
        The array sqrt(n) * (mean - 1) per trajectory, converging in
        distribution to the standard normal as n grows.
    """
    samples = rng.exponential(1.0, size=(paths, n))
    return np.sqrt(n) * (samples.mean(axis=1) - 1.0)


rng = np.random.default_rng(0)
z = standardized_means(n=2_000, paths=200_000, rng=rng)
excess_kurtosis = float(((z - z.mean()) ** 4).mean() / z.var() ** 2 - 3.0)

The Chebyshev bound Equation (2) and the strong law Theorem 4 pin the average to its mean, while Theorem 5 resolves the residual fluctuation at the scale $\sqrt n$ .

[1]

R. Durrett, Probability: Theory and Examples, 5th ed. Cambridge University Press, 2019.

[2]

D. Williams, Probability with Martingales. Cambridge University Press, 1991.

Explore connections

see in the atlas

referenced by (2)

cite

@misc{convergence-and-limit-theorems,
  author = {Zac Kienzle},
  title  = {Convergence and Limit Theorems},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/convergence-and-limit-theorems}
}