Skip to content
homeaboutworkprojectsthesiswritingresume
Loading
~/blog/probability-spaces0%dark
  1. home/
  2. writing/
  3. Probability Spaces and Random Variables

03 June 2026 · 6 min read · updated 09 June 2026

Probability Spaces and Random Variables

Modern probability is the measure theory of a space with total mass one. We define the probability space, random variables as measurable functions, and their laws as pushforward measures on the line, characterised by the distribution function. Expectation is the integral, and the change of variables computes it from the law alone. We prove the Markov, Chebyshev, and Jensen inequalities, the elementary bounds that control tails and convex transformations and that the law of large numbers and the concentration results build on.

  • 16 results
  • 16 connections
  • probability
  • measure-theory
On this page▾
  • The probability space
  • Random variables and their laws
  • Expectation
  • The elementary inequalities

6 min left

  • The probability space1m
  • Random variables and their laws2m
  • Expectation1m
  • The elementary inequalities2m

Probability is the measure theory of a space whose total mass is one. Every notion of the subject (a random variable, its distribution, its expectation, its variance) is a measure-theoretic object read through that normalisation, and the payoff is that the Lebesgue integral and the construction of measures carry over wholesale. This post fixes the language, the probability space and the random variable, identifies the law of a random variable as a measure on the line, and proves the elementary inequalities that the limit theorems run on [1], [2].

#The probability space

Definition1

A probability space is a measure space (Ω,F,P)(\Omega,\mathcal F,\P)(Ω,F,P) with P(Ω)=1\P(\Omega)=1P(Ω)=1. The set Ω\OmegaΩ is the sample space, the sigma-algebra F\mathcal FF is the collection of events, and the measure P\PP is the probability.

Everything proved for general measures applies. Monotonicity gives P(A)≤P(B)\P(A)\le\P(B)P(A)≤P(B) for A⊆BA\subseteq BA⊆B, finite additivity and P(Ω)=1\P(\Omega)=1P(Ω)=1 give the complement rule P(Ac)=1−P(A)\P(A^c)=1-\P(A)P(Ac)=1−P(A), and continuity of measures gives P(An)→P(A)\P(A_n)\to\P(A)P(An​)→P(A) along monotone sequences of events, which justifies passing probabilities through monotone limits of events. Countable subadditivity, P(⋃nAn)≤∑nP(An)\P(\bigcup_n A_n)\le\sum_n\P(A_n)P(⋃n​An​)≤∑n​P(An​), is the union bound.

#Random variables and their laws

Definition2

A random variable is a measurable function X:Ω→RX:\Omega\to\RX:Ω→R, meaning {X≤x}∈F\{X\le x\}\in\mathcal F{X≤x}∈F for every xxx. Its law or distribution is the pushforward measure PX(B)=P(X−1(B))=P(X∈B)\P_X(B)=\P(X^{-1}(B))=\P(X\in B)PX​(B)=P(X−1(B))=P(X∈B) on the Borel sets of R\RR.

Proposition3

The law PX\P_XPX​ is a probability measure on (R,B)(\R,\mathcal B)(R,B).

Proof

Preimage commutes with all set operations, so X−1(∅)=∅X^{-1}(\emptyset)=\emptysetX−1(∅)=∅ gives PX(∅)=0\P_X(\emptyset)=0PX​(∅)=0, and for disjoint Borel sets BnB_nBn​ the preimages X−1(Bn)X^{-1}(B_n)X−1(Bn​) are disjoint events, so countable additivity of P\PP transfers, PX(⋃nBn)=P(⋃nX−1(Bn))=∑nP(X−1(Bn))=∑nPX(Bn)\P_X(\bigcup_n B_n)=\P(\bigcup_n X^{-1}(B_n))=\sum_n\P(X^{-1}(B_n))=\sum_n\P_X(B_n)PX​(⋃n​Bn​)=P(⋃n​X−1(Bn​))=∑n​P(X−1(Bn​))=∑n​PX​(Bn​). Finally PX(R)=P(Ω)=1\P_X(\R)=\P(\Omega)=1PX​(R)=P(Ω)=1.

The law lives on the line, independent of Ω\OmegaΩ, and is determined by its values on the half-lines.

Definition4

The distribution function of XXX is F(x)=P(X≤x)=PX((−∞,x])F(x)=\P(X\le x)=\P_X((-\infty,x])F(x)=P(X≤x)=PX​((−∞,x]).

Proposition5

A distribution function is nondecreasing and right-continuous with lim⁡x→−∞F(x)=0\lim_{x\to-\infty}F(x)=0limx→−∞​F(x)=0 and lim⁡x→∞F(x)=1\lim_{x\to\infty}F(x)=1limx→∞​F(x)=1. Conversely every such FFF is the distribution function of a unique law.

Proof

Monotonicity of P\PP on the nested half-lines gives FFF nondecreasing. Right-continuity is continuity of PX\P_XPX​ from above, F(x+1/n)=PX((−∞,x+1/n])→PX((−∞,x])=F(x)F(x+1/n)=\P_X((-\infty,x+1/n])\to\P_X((-\infty,x])=F(x)F(x+1/n)=PX​((−∞,x+1/n])→PX​((−∞,x])=F(x) as the sets decrease to (−∞,x](-\infty,x](−∞,x], and the two limits are PX(∅)=0\P_X(\emptyset)=0PX​(∅)=0 and PX(R)=1\P_X(\R)=1PX​(R)=1 by continuity along (−∞,−n]↓∅(-\infty,-n]\downarrow\emptyset(−∞,−n]↓∅ and (−∞,n]↑R(-\infty,n]\uparrow\R(−∞,n]↑R, where finite measure (every set has mass at most 111) permits continuity from above. For the converse, FFF assigns each half-open interval the mass μ((a,b])=F(b)−F(a)\mu((a,b])=F(b)-F(a)μ((a,b])=F(b)−F(a), a nonnegative set function on the half-open intervals. Finite additivity is immediate by telescoping, and right-continuity of FFF upgrades this to countable additivity, so μ\muμ is a premeasure on the semiring of half-open intervals. Concretely, if (a,b]=⨆n(an,bn](a,b]=\bigsqcup_n(a_n,b_n](a,b]=⨆n​(an​,bn​], choose δn\delta_nδn​ by right-continuity at bnb_nbn​ so that F(bn+δn)−F(bn)<ϵ2−nF(b_n+\delta_n)-F(b_n)<\epsilon 2^{-n}F(bn​+δn​)−F(bn​)<ϵ2−n; the open intervals (an,bn+δn)(a_n,b_n+\delta_n)(an​,bn​+δn​) cover the compact [a+δ,b][a+\delta,b][a+δ,b], a finite subcover with finite additivity and monotonicity gives F(b)−F(a+δ)≤∑n(F(bn)−F(an))+ϵF(b)-F(a+\delta)\le\sum_n(F(b_n)-F(a_n))+\epsilonF(b)−F(a+δ)≤∑n​(F(bn​)−F(an​))+ϵ, and letting δ→0\delta\to0δ→0 (right-continuity at aaa) and ϵ→0\epsilon\to0ϵ→0 yields one inequality while finite additivity gives the reverse. The Caratheodory extension then carries the premeasure to a Borel measure, unique because the intervals are an intersection-closed generating system and the measures are finite (total mass 111), so the pi-system uniqueness theorem applies, exactly as for Lebesgue measure.

So a random variable can be specified by a distribution function alone, without naming the probability space, and two random variables with the same law are interchangeable for any question about their values.

#Expectation

The expectation is the integral against P\PP, and it inherits linearity, monotonicity, and the convergence theorems from integration.

Definition6

The expectation of an integrable random variable is E[X]=∫ΩX dP\E[X]=\int_\Omega X\,d\PE[X]=∫Ω​XdP. Its variance, when XXX is square-integrable, is Var⁡(X)=E[(X−EX)2]=E[X2]−(EX)2\Var(X)=\E[(X-\E X)^2]=\E[X^2]-(\E X)^2Var(X)=E[(X−EX)2]=E[X2]−(EX)2.

Computing E[X]\E[X]E[X] seems to require the space Ω\OmegaΩ, but the law suffices, because integrating a pushforward reduces to integrating against the pushed measure.

Theorem7

For any Borel function ggg with g(X)g(X)g(X) integrable, E[g(X)]=∫Rg dPX\E[g(X)]=\int_\R g\,d\P_XE[g(X)]=∫R​gdPX​. In particular E[X]=∫Rx dPX(x)\E[X]=\int_\R x\,d\P_X(x)E[X]=∫R​xdPX​(x) and E[X]=∫Rx dF(x)\E[X]=\int_\R x\,dF(x)E[X]=∫R​xdF(x) as a Stieltjes integral.

Proof

The identity ∫Ω1B(X) dP=P(X∈B)=∫R1B dPX\int_\Omega\mathbf 1_B(X)\,d\P=\P(X\in B)=\int_\R\mathbf 1_B\,d\P_X∫Ω​1B​(X)dP=P(X∈B)=∫R​1B​dPX​ is the definition of the law, so the claim holds for indicators g=1Bg=\mathbf 1_Bg=1B​. Linearity extends it to nonnegative simple functions, the monotone convergence theorem extends it to nonnegative measurable ggg through an increasing approximation, and splitting g=g+−g−g=g^+-g^-g=g+−g− extends it to integrable ggg.

This is why a distribution alone determines every moment and every expectation of a function of XXX.

#The elementary inequalities

Three inequalities underpin the limit theorems that follow. The first bounds the tail of a nonnegative variable by its mean.

Theorem8

For a nonnegative random variable XXX and a>0a>0a>0, P(X≥a)≤E[X]/a\P(X\ge a)\le\E[X]/aP(X≥a)≤E[X]/a.

Proof

The pointwise bound a 1{X≥a}≤Xa\,\mathbf 1_{\{X\ge a\}}\le Xa1{X≥a}​≤X holds because the indicator is supported where X≥aX\ge aX≥a. Taking expectations and using monotonicity, a P(X≥a)≤E[X]a\,\P(X\ge a)\le\E[X]aP(X≥a)≤E[X], and dividing by aaa gives the claim.

Corollary9

For square-integrable XXX with mean μ\muμ and any a>0a>0a>0, P(∣X−μ∣≥a)≤Var⁡(X)/a2\P(\abs{X-\mu}\ge a)\le\Var(X)/a^2P(∣X−μ∣≥a)≤Var(X)/a2.

Proof

Apply Theorem 8 to the nonnegative variable (X−μ)2(X-\mu)^2(X−μ)2 at level a2a^2a2, giving P(∣X−μ∣≥a)=P((X−μ)2≥a2)≤E[(X−μ)2]/a2=Var⁡(X)/a2\P(\abs{X-\mu}\ge a)=\P((X-\mu)^2\ge a^2)\le\E[(X-\mu)^2]/a^2=\Var(X)/a^2P(∣X−μ∣≥a)=P((X−μ)2≥a2)≤E[(X−μ)2]/a2=Var(X)/a2.

Chebyshev is the engine of the weak law of large numbers, where it sends the probability of deviation to zero as variances average down. The third inequality controls convex transformations.

Theorem10

If φ:R→R\varphi:\R\to\Rφ:R→R is convex and XXX and φ(X)\varphi(X)φ(X) are integrable, then φ(EX)≤E[φ(X)]\varphi(\E X)\le\E[\varphi(X)]φ(EX)≤E[φ(X)].

Proof

A convex function has a supporting line at m=E[X]m=\E[X]m=E[X], a slope ccc with φ(x)≥φ(m)+c(x−m)\varphi(x)\ge\varphi(m)+c(x-m)φ(x)≥φ(m)+c(x−m) for all xxx, given by any subgradient at mmm. Substituting XXX and taking expectations, the right side is E[φ(m)+c(X−m)]=φ(m)+c(E[X]−m)=φ(m)=φ(EX)\E[\varphi(m)+c(X-m)]=\varphi(m)+c(\E[X]-m)=\varphi(m)=\varphi(\E X)E[φ(m)+c(X−m)]=φ(m)+c(E[X]−m)=φ(m)=φ(EX), since E[X]−m=0\E[X]-m=0E[X]−m=0, while the left side is E[φ(X)]\E[\varphi(X)]E[φ(X)], giving φ(EX)≤E[φ(X)]\varphi(\E X)\le\E[\varphi(X)]φ(EX)≤E[φ(X)].

Jensen's inequality is why ∣EX∣≤E∣X∣\abs{\E X}\le\E\abs X∣EX∣≤E∣X∣, why variance is nonnegative, and why the LpL^pLp norms E[∣X∣p]1/p\E[\abs X^p]^{1/p}E[∣X∣p]1/p increase in ppp, the monotonicity that orders the spaces of random variables. These tools assemble the basic picture. A random variable is a measurable function whose averages are integrals, and the rest of probability studies how those integrals behave under independence, limits, and conditioning. The square-integrable random variables in particular form the Hilbert space L2(Ω,F,P)L^2(\Omega,\mathcal F, \P)L2(Ω,F,P) with inner product E[XY]\E[XY]E[XY], the geometry in which covariance is an angle and conditional expectation is a projection.

[1]
R. Durrett, Probability: Theory and Examples, 5th ed. in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.
[2]
D. Williams, Probability with Martingales. Cambridge University Press, 1991.

Part 1 of 9 in Probability

next →Independence

Explore connections

see in the atlas →

related

  • Conditional Expectation
  • Predictable Processes and Stopping Times
  • Martingales

referenced by (6)

  • Characteristic Functions
  • Independence
  • Quadratic Variation
  • Second-Order Processes and Mean-Square Calculus
  • The Mean-Variance Portfolio
  • Uniform Integrability and the Vitali Theorem
cite
@misc{probability-spaces,
  author = {Zac Kienzle},
  title  = {Probability Spaces and Random Variables},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/probability-spaces}
}