Probability

Martingale

E[Xₙ₊₁ | F_n] = Xₙ — your expected future given the past equals the present

A martingale is a stochastic process (Xₙ) on a filtered probability space (Ω, F, (Fₙ), P) satisfying E[Xₙ₊₁ | F_n] = Xₙ — given the entire history up to time n, the expected value of the next observation equals the current value. The "fair game" condition: no betting strategy can yield positive expected profit. Named by Jean Ville (1939); modern theory by Joseph Doob (1940s, Stochastic Processes 1953). Famous results: Doob's optional stopping theorem (under regularity, E[X_τ] = E[X₀] for stopping times τ), martingale convergence theorem (a martingale bounded in L¹ converges a.s.), and Doob's maximal inequality (P(max Xₙ ≥ λ) ≤ E[X_T]/λ). Used in: stochastic calculus (Itô integral is a martingale), mathematical finance (Black-Scholes is a martingale measure), and gambling theory.

DefinitionE[Xₙ₊₁ | F_n] = Xₙ
NamedVille 1939, Doob 1953
Optional stoppingE[X_τ] = E[X₀]
Convergence theorembounded in L¹ ⇒ a.s. limit
Maximal inequalityDoob's bound on running max
Black-Scholesmartingale measure

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why martingale matters

Financial models. Risk-neutral pricing reduces an option's value to a martingale expectation; Black-Scholes is a martingale measure on geometric Brownian motion.
Stochastic calculus. Every Itô integral against Brownian motion is a martingale; the Itô isometry, Girsanov's theorem, and martingale representation underpin all of continuous-time finance and stochastic control.
Gambling theory. The optional stopping theorem proves no clever strategy beats a fair coin — including the historical "martingale" doubling strategy, which technically loses in expectation when bankroll or time is bounded.
Hypothesis testing. Sequential tests use likelihood-ratio martingales; Ville's inequality gives anytime-valid p-values that don't suffer from peeking.
Branching processes. Galton–Watson trees: dividing population by E[offspring]^n yields a martingale. Convergence theorem gives the extinction probability.
Pólya urns and Bayesian posteriors. Posterior probabilities form a martingale under the prior — convergence is automatic.
Concentration inequalities. Azuma–Hoeffding bounds for martingales with bounded increments are the foundation of probabilistic combinatorics and online learning.

Formal setup

Fix a filtered probability space (Ω, F, (Fₙ)_{n≥0}, P). A sequence (Xₙ) of random variables is a martingale with respect to (Fₙ) if:

(1) Xₙ is Fₙ-measurable           (adapted)
(2) E[|Xₙ|] < ∞                    (integrable)
(3) E[Xₙ₊₁ | Fₙ] = Xₙ a.s.        (martingale property)

The natural filtration is Fₙ = σ(X₀, X₁, ..., Xₙ) — the σ-algebra generated by the process up to time n. Replacing equality with ≥ gives a submartingale; with ≤, a supermartingale.

Canonical examples

Simple random walk. Sₙ = X₁ + ... + Xₙ where Xᵢ are i.i.d. with E[Xᵢ] = 0. Then Sₙ is a martingale; Sₙ² − n is also a martingale.
Wealth in a fair game. Bet $1 on a fair coin; your wealth Xₙ is a martingale.
Pólya urn. Start with one red and one blue ball; draw, return with another of the same color. The fraction of red balls is a bounded martingale — converges a.s. to a Beta(1,1) = Uniform(0,1) limit.
Likelihood ratio. Lₙ = ∏ q(Xᵢ)/p(Xᵢ) under P-distribution p is a martingale — basis for Wald's sequential probability ratio test.
Conditional expectation tower. For any L¹ random variable Y, Mₙ = E[Y | Fₙ] is automatically a martingale (Doob's martingale).

Optional stopping theorem

A stopping time τ is a random time such that the event {τ ≤ n} is Fₙ-measurable — you can decide whether to stop using only information available so far, no peeking ahead. Doob's optional stopping theorem says that under any of these regularity conditions:

τ is bounded (τ ≤ N a.s. for some constant N), or
Xₙ is uniformly bounded and τ < ∞ a.s., or
E[τ] < ∞ and the increments |Xₙ₊₁ − Xₙ| are bounded,

then E[X_τ] = E[X₀]. The simple-random-walk-hits-+1 example violates the third condition (E[τ] = ∞) — and the conclusion fails: E[X_τ] = 1, not 0.

Martingale convergence theorem

If Xₙ is a martingale with sup_n E[|Xₙ|] < ∞, then Xₙ → X_∞ almost surely for some X_∞ with E[|X_∞|] < ∞. The proof uses Doob's upcrossing inequality: the expected number of times the path crosses any interval [a, b] is bounded — so it can only oscillate finitely many times, hence must converge.

Caveat: a.s. convergence does not imply L¹ convergence. The doubling strategy on a fair coin is a martingale that converges a.s. to a finite limit but has E[Xₙ] = E[X₀] for all n — yet E[X_∞] can be different. L¹ convergence requires uniform integrability.

Doob's maximal inequality

For a non-negative submartingale Xₙ and λ > 0:

P(max_{0 ≤ n ≤ N} Xₙ ≥ λ) ≤ E[X_N] / λ

This is the martingale analog of Markov's inequality — but it bounds the running maximum, not just a single time. The L^p version (Doob's L^p inequality) gives ‖max Xₙ‖_p ≤ (p/(p−1)) ‖X_N‖_p for p > 1. These bounds underpin Burkholder–Davis–Gundy inequalities and most of modern probabilistic analysis.

Connection to mathematical finance

The fundamental theorem of asset pricing (Harrison–Kreps 1979, Delbaen–Schachermayer 1994): a discrete-time market is arbitrage-free if and only if there exists an equivalent martingale measure Q under which discounted asset prices are martingales. Pricing a derivative reduces to computing E_Q[discounted payoff].

For geometric Brownian motion dS_t = μ S_t dt + σ S_t dW_t, a Girsanov change of measure absorbs the drift μ into a new Brownian motion, leaving the discounted price as a martingale. The expectation E_Q[e^(-rT)(S_T − K)⁺] then evaluates to the Black–Scholes formula — and that's where the closed form comes from.

Common misconceptions

"A martingale always converges." Only if it's bounded in L¹. The simple random walk is a martingale that does not converge.
"No strategy beats a martingale." True for stopping times satisfying optional-stopping regularity. False if you allow unbounded τ — random walks hit any level eventually.
"Fair game means equal probability of winning." Fair game means conditional expectation of next state equals current — no edge in expectation, regardless of variance or skew.
"Martingales are about gambling." Originally yes, but the modern theory is the language of stochastic processes — Itô calculus, filtering theory, sequential analysis, and machine learning concentration bounds all rest on it.
"Submartingales drift down." Reverse: submartingales drift up. The naming follows convex-function analogy: super = above the diagonal, sub = below.
"The doubling strategy works." It's a supermartingale once you account for finite bankroll; expected loss is bounded but eventual loss is certain.

Azuma–Hoeffding inequality

If (Xₙ) is a martingale with bounded increments |Xₙ − Xₙ₋₁| ≤ cₙ, then for any t > 0:

P(|X_N − X₀| ≥ t) ≤ 2 exp(−t² / (2 ∑ cₙ²))

This Gaussian-tail bound is the workhorse of probabilistic combinatorics — random graph properties, randomized algorithms, online learning regret bounds — wherever you can express a quantity as a martingale with bounded differences.

Frequently asked questions

What is a filtration in probability?

A filtration (Fₙ) is an increasing sequence of σ-algebras on the sample space Ω, with Fₙ ⊆ Fₙ₊₁ ⊆ F. Intuitively, Fₙ is the information available at time n — every event observable by time n is measurable with respect to Fₙ. The conditional expectation E[Xₙ₊₁ | Fₙ] uses this information set to predict the next value. As time advances, no information is lost, only added — that's why filtrations are increasing.

Why is a fair game mathematically a martingale?

If Xₙ is your wealth after n bets in a fair game, the expected wealth after the next bet, given everything you know now, equals your current wealth: E[Xₙ₊₁ | Fₙ] = Xₙ. There is no edge — the conditional mean change is zero. Submartingales (E[Xₙ₊₁ | Fₙ] ≥ Xₙ) model favorable games; supermartingales (E[Xₙ₊₁ | Fₙ] ≤ Xₙ) model unfavorable ones. The discrete Lebesgue decomposition splits any adapted process uniquely into a martingale plus a predictable drift — the Doob decomposition.

What is Doob's optional stopping theorem?

Under regularity conditions on a stopping time τ — typically τ bounded, or Xₙ uniformly bounded, or E[τ] finite with bounded increments — the expected stopped value equals the starting value: E[X_τ] = E[X₀]. Why it matters: no clever stopping rule beats a fair game on average. Without regularity it can fail; the classic counterexample is a simple random walk stopped when it first hits +1 — E[X_τ] = 1 ≠ 0 = E[X₀], because E[τ] = ∞.

Why is the martingale convergence theorem useful?

If Xₙ is a martingale (or sub/supermartingale) bounded in L¹ — sup_n E[|Xₙ|] less than infinity — then Xₙ converges almost surely to a random variable X_∞ with E[|X_∞|] less than infinity. This is one of the most powerful convergence results in probability — it generalizes the law of large numbers, gives existence of the Radon–Nikodym derivative, and underlies the Lévy upward and downward theorems. Branching processes, Bayesian posteriors, and Pólya urns all converge by this single theorem.

How does Black-Scholes use martingales?

Under the risk-neutral measure Q, the discounted asset price e^(-rt)S_t is a martingale. The fundamental theorem of asset pricing says: a market is arbitrage-free if and only if such a measure exists. Pricing a European option becomes computing the Q-expectation E_Q[e^(-rT) · payoff(S_T)]. For geometric Brownian motion, this expectation has a closed form — the Black-Scholes formula. The 1997 Nobel Prize was awarded to Merton and Scholes (Black had died) for this framework.

What's the difference between submartingale and supermartingale?

A submartingale satisfies E[Xₙ₊₁ | Fₙ] ≥ Xₙ — it tends to drift upward (favorable game). A supermartingale satisfies E[Xₙ₊₁ | Fₙ] ≤ Xₙ — it tends to drift downward (unfavorable game). A martingale is both. The naming is counterintuitive: a supermartingale is super-bad for the gambler, sub is super-good. By Jensen's inequality, |Xₙ| and Xₙ² of a martingale are submartingales when integrable, which lets Doob's maximal inequality bound their running maxima.