Measure Theory
Sigma-Algebra
A collection of subsets closed under complement, countable union, and countable intersection
A sigma-algebra (σ-algebra) F on a set Ω is a collection of subsets of Ω satisfying: (1) Ω ∈ F, (2) if A ∈ F then Aᶜ ∈ F (closed under complement), (3) if A₁, A₂, ... ∈ F then ⋃ᵢ Aᵢ ∈ F (closed under countable union). It is the natural domain on which a measure (and hence probability) can be consistently defined. The Borel σ-algebra B(ℝ) — generated by all open intervals — contains every set you can construct via countable operations, but not every subset of ℝ (Vitali set, 1905). Required for Lebesgue measure (1904) and Kolmogorov's probability foundation (1933).
- Closed underComplement, countable union, countable intersection
- Smallest{∅, Ω}
- Borel σ-algebraGenerated by open sets
- Not all subsetsVitali set (1905)
- Foundational paperKolmogorov 1933
- Measurable functionf⁻¹(Borel) ⊆ F
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Why sigma-algebras matter
- Probability theory. Kolmogorov's 1933 axiomatization defines a probability space as (Ω, F, P) where F is a σ-algebra of events. Without it, the standard machinery of conditional probability, independence, and convergence theorems all collapse.
- Lebesgue integration. The Lebesgue integral is defined first for indicator functions of measurable sets, then extended to nonnegative measurable functions, then to integrable functions. The σ-algebra structure makes this construction work.
- Stochastic processes. A filtration (F_t) — an increasing family of σ-algebras indexed by time — encodes the information available at each instant. Martingales, Brownian motion, Itô calculus, and stochastic differential equations are all defined relative to filtrations.
- Mathematical finance. Pricing of options and other derivatives uses risk-neutral measures, equivalent martingale measures, and Radon-Nikodym derivatives — every concept that requires precise specification of which sets are observable, which is what σ-algebras provide.
- Information theory. Conditional entropy H(X | Y) is defined via conditional expectation relative to the σ-algebra generated by Y. Sufficient statistics, complete statistics, and the Rao-Blackwell theorem all live in this language.
- Functional analysis. The duality of L^p and L^q for 1 < p, q < ∞ with 1/p + 1/q = 1, and the Riesz representation theorem identifying continuous linear functionals with measures, both rest on σ-algebra foundations.
- Ergodic theory. The Birkhoff ergodic theorem, mixing properties, and the entropy of dynamical systems all require σ-algebras to formulate.
The three axioms
Let Ω be a nonempty set. A collection F of subsets of Ω is a σ-algebra if:
- Ω ∈ F. The whole space is in the collection.
- Closure under complement. If A ∈ F then Ω \ A ∈ F.
- Closure under countable union. If A₁, A₂, A₃, ... ∈ F is a sequence (countably many), then ⋃ᵢ Aᵢ ∈ F.
From these three follow several derived properties: ∅ = Ωᶜ ∈ F; closure under countable intersection because ⋂ᵢ Aᵢ = (⋃ᵢ Aᵢᶜ)ᶜ; closure under finite operations as special cases; and closure under set differences A \ B = A ∩ Bᶜ. The "σ" in σ-algebra emphasizes the countable closure — without it we have just an algebra of sets.
Canonical examples
Trivial. The smallest σ-algebra on Ω is {∅, Ω}. The largest is the power set 2^Ω. Every σ-algebra lies between these.
Discrete. If Ω is countable, the natural choice is F = 2^Ω (the power set). Every subset is measurable, and probability is determined by point masses P({ω}).
Borel σ-algebra B(ℝ). The smallest σ-algebra containing every open interval (a, b). Equivalently, it is generated by closed intervals, half-open intervals, single points, or open sets. By transfinite iteration of complement and countable union, one builds the Borel hierarchy: Σ⁰₁ (open), Π⁰₁ (closed), Σ⁰₂ (Fσ), Π⁰₂ (Gδ), and so on through ω₁ levels. All Borel sets land in B(ℝ), and B(ℝ) has cardinality continuum.
Lebesgue σ-algebra L. The completion of B(ℝ) with respect to Lebesgue measure: add every subset of a Borel set with measure zero. L is strictly larger than B(ℝ) and strictly smaller than the power set 2^ℝ. The cardinality of L is 2^continuum.
Cylinder σ-algebra on Ω = {0, 1}^ℕ. Generated by sets that fix finitely many coordinates and let the rest vary freely. This is the natural σ-algebra for an infinite sequence of coin tosses; it gives 2^ℕ a measurable structure suitable for probability.
σ-algebra generated by a class
For any collection C of subsets of Ω, there is a unique smallest σ-algebra σ(C) containing C — the intersection of all σ-algebras containing C. (This intersection is itself a σ-algebra because the three axioms are preserved by arbitrary intersection.) The Borel σ-algebra is σ(open sets); the cylinder σ-algebra on {0, 1}^ℕ is σ(cylinder sets).
Generated σ-algebras can be hard to describe explicitly. For example, the Borel σ-algebra contains many "stage-by-stage" constructions but identifying which transfinite stage a given set first appears at is generally undecidable. Descriptive set theory is the systematic study of complexity for definable subsets of Polish spaces.
Vitali's non-measurable set
Vitali's 1905 construction shows the power set 2^ℝ contains pathologies that no Lebesgue-style measure can handle. Define an equivalence relation on [0, 1] by x ∼ y iff x − y ∈ ℚ. Each equivalence class is countable. Use the Axiom of Choice to pick one representative from each class; let V be the set of representatives.
The countable translates V_q = {v + q (mod 1) : v ∈ V} for q ∈ ℚ ∩ [0, 1] are pairwise disjoint and their union is [0, 1]. By translation invariance of Lebesgue measure, all V_q have the same measure. By countable additivity, μ([0, 1]) = Σ_q μ(V_q). Both options — μ(V) = 0 (forces μ([0, 1]) = 0) and μ(V) > 0 (forces μ([0, 1]) = ∞) — contradict μ([0, 1]) = 1. Hence V cannot be Lebesgue-measurable.
Solovay (1970) showed ZF + Dependent Choice + "every subset of ℝ is Lebesgue-measurable" is consistent, assuming an inaccessible cardinal is consistent. So the existence of non-measurable sets is exactly the price of full AC.
Measurable functions
A function f : Ω → ℝ is measurable with respect to a σ-algebra F if f⁻¹(B) ∈ F for every Borel set B in ℝ. Equivalently, it suffices to check that f⁻¹((-∞, a]) ∈ F for every a ∈ ℝ — measurability propagates through Borel structure.
Measurable functions form a vector space, are closed under pointwise limits of sequences (a key feature absent from the Riemann theory), and admit composition with Borel functions. Random variables in probability theory are exactly real-valued measurable functions on the probability space.
Kolmogorov's axiomatization
Andrey Kolmogorov's 1933 monograph Foundations of the Theory of Probability defined a probability space as a triple (Ω, F, P) where Ω is a nonempty set, F is a σ-algebra on Ω, and P : F → [0, 1] is a function with P(Ω) = 1 and countable additivity (P of disjoint countable union equals sum of P's). This three-line definition unified the discrete combinatorial probability of Pascal-Bernoulli, the geometric probability of Buffon-Bertrand, and the limit theorems of Lévy-Khinchin under one foundation.
Every modern result in probability — strong law of large numbers, central limit theorem, ergodic theorem, martingale convergence, Brownian motion construction, Itô calculus — uses the σ-algebra F as the home of events and depends on its three axioms.
Common misconceptions
- All subsets work. The power set 2^Ω is a σ-algebra, but it is not always usable. On uncountable Ω, no nontrivial measure can be defined on the entire power set in a translation-invariant way (Vitali). Restricting to a smaller σ-algebra is mandatory for Lebesgue and most probability measures.
- Discrete probability needs σ-algebras. Conceptually yes, practically no. For finite or countable Ω, F is taken to be the power set, and the σ-algebra structure is invisible. The machinery only earns its keep on uncountable spaces such as ℝ, ℝ^∞, function spaces, and stochastic process spaces.
- Always the power set. Outside of countable spaces, F is usually a strict subset of 2^Ω. The Borel and Lebesgue σ-algebras on ℝ are strictly between the trivial one and the power set, and most "interesting" σ-algebras in stochastic processes are far from the power set.
- Borel and Lebesgue are the same. They differ. L (Lebesgue) is the completion of B (Borel) with respect to Lebesgue measure: it adds all subsets of measure-zero Borel sets. There exist subsets of the Cantor set that are Lebesgue-measurable (because the Cantor set is null) but not Borel.
- Countable additivity is the same as finite additivity. Finite additivity admits "finitely additive measures" that lack many useful theorems (no dominated convergence, no Radon-Nikodym). Countable additivity is strictly stronger and is what makes measure theory powerful. A finitely additive but not countably additive measure exists on the power set of ℕ (Banach limits).
- σ-algebras are obscure foundations of no working interest. Filtrations in finance, conditional expectation in statistics, Itô integration in physics — these are σ-algebras worn lightly. Anyone writing real applied stochastic code deals with σ-algebras whether they call them that or not.
Modern role
The σ-algebra concept is approaching its centennial in 2033. Despite many proposals to replace it (constructive probability, free probability, quantum probability, fuzzy measures), the σ-algebra remains the working language for stochastic analysis, mathematical finance, statistics, dynamical systems, and ergodic theory. The reason is its perfect fit: rich enough to support countable limits, restrictive enough to avoid pathologies, abstract enough to apply to any domain from coin tosses to Brownian paths to Hilbert-space-valued random variables.
Frequently asked questions
Why countable not finite operations?
An algebra of sets requires closure under finite unions only. A sigma-algebra requires the strictly stronger condition of closure under countable unions. The reason is that limits, suprema, and infima of sequences need countably many sets to be expressed. In probability, a tail event such as 'A_n occurs infinitely often' is ⋂_{N=1}^∞ ⋃_{n≥N} A_n — a countable intersection of countable unions. Without countable closure, none of the convergence theorems of measure theory work. Going further to uncountable closure produces a power-set-like structure with no useful nontrivial measure (apart from the counting measure on countable sets), which is why countable is the right level.
What is the Borel σ-algebra?
The Borel σ-algebra B(ℝ) is the smallest σ-algebra on ℝ containing every open interval. Equivalently, it is the intersection of all σ-algebras that contain the open sets. By countable union and complement, B(ℝ) contains every open set, every closed set, every Fσ (countable union of closed sets), every Gδ (countable intersection of open sets), and so on through the Borel hierarchy of complexity ω₁. Practically, every set you can describe by an explicit recipe involving sequences of open or closed sets is Borel. The cardinality of B(ℝ) is 2^ℵ₀, the same as ℝ — but the power set of ℝ has cardinality 2^(2^ℵ₀), strictly larger.
What is a measurable set vs not (Vitali)?
Vitali (1905) constructed a subset V of [0, 1] that cannot be assigned a Lebesgue measure consistent with translation invariance and countable additivity. The construction picks one representative from each coset of the rational subgroup ℚ in ℝ — using the Axiom of Choice. The countable translates of V by rationals partition [0, 1] (modulo a null set), so by countable additivity the measure of [0, 1] would be 0 if μ(V) = 0 or ∞ if μ(V) > 0, contradicting μ([0, 1]) = 1. Hence V is non-measurable. The Lebesgue σ-algebra L is strictly between the Borel σ-algebra and the power set; non-Lebesgue-measurable sets exist if and only if AC holds (Solovay 1970).
How does σ-algebra define probability events?
A probability space is a triple (Ω, F, P) where Ω is the sample space (set of outcomes), F is a σ-algebra on Ω (the events), and P is a probability measure on F. Members of F are exactly the events to which probabilities can be assigned. For a fair die, Ω = {1, 2, 3, 4, 5, 6} and F is the power set; every subset is an event. For a coin tossed infinitely often, Ω = {0, 1}^ℕ and F is the σ-algebra generated by cylinders; not every subset of Ω is an event. The σ-algebra structure is what allows P(⋃_n A_n) = Σ_n P(A_n) for disjoint countable unions.
What is a filtration in stochastic processes?
A filtration is an increasing family of σ-algebras (F_t)_{t ≥ 0} indexed by time. Each F_t represents the information available at time t — the events whose occurrence can be determined by observing the process up to time t. For a stochastic process X_t, the natural filtration is F_t = σ(X_s : s ≤ t), the σ-algebra generated by the process history. A martingale, submartingale, or supermartingale is defined via conditional expectations relative to a filtration: E[X_t | F_s] = X_s for martingales when s ≤ t. Filtrations are central to mathematical finance, where F_t is the information accessible to a trader, and trading strategies must be F_t-measurable.
Why is the σ-algebra needed for Lebesgue but not Riemann integration?
Riemann integration approximates the area under a curve by partitioning the domain into intervals. Intervals are simple, fixed objects, and one does not need a σ-algebra — just open or closed intervals. But the Riemann integral fails for highly oscillatory or wildly discontinuous functions, and convergence theorems are weak. Lebesgue's idea was to partition the range, then ask which subsets of the domain map into each piece of the range. Those subsets need not be intervals — they can be arbitrary measurable sets. The σ-algebra is then required to define what 'arbitrary measurable' means and to guarantee additivity. The payoff is the dominated convergence theorem and a much larger class of integrable functions.