Probability
Random Walk
Take a step left or right, fair coin each time — simple, but the math behind diffusion, finance, and Google's PageRank
A random walk is a path built from independent random steps. The simplest version moves +1 or −1 each second on a fair coin flip, but the same idea drives Brownian motion, stock-price models, gambler's ruin, the convergence of MCMC samplers, and the algorithm behind PageRank. Almost every "things spreading" model in physics and finance is a random walk in disguise.
- MeanE[S_n] = 0
- VarianceVar[S_n] = n
- Typical distance√n
- Pólya, 1921recurrent in 1D, 2D
- 3D return prob.≈ 0.34
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The simple symmetric walk on ℤ
Sit at position 0 on the integer number line. Flip a fair coin. Move +1 on heads, −1 on tails. Repeat. Define S_n as your position after n flips:
S_n = X_1 + X_2 + ... + X_n, where each X_i ∈ {+1, −1} with prob 1/2 each, independent.
The X_i are i.i.d. Each has mean 0 and variance 1. Linearity gives E[S_n] = 0 — the walker is on average at the origin — and independence gives Var[S_n] = n, so the typical absolute distance from the origin scales like √n. This is the diffusive scaling that characterises every random-walk-like process: the spread grows as the square root of the time, so doubling the spread takes four times as long.
The position S_n is a sum of n bounded i.i.d. random variables, so by the central limit theorem, S_n/√n converges in distribution to a standard Normal as n → ∞. After many steps, the walk's position looks Gaussian with mean 0 and variance n. The exact PMF for finite n is also Binomial: if k is the number of +1 steps among n, then S_n = 2k − n is a shifted Binomial(n, 1/2).
Pólya's theorem: when does the walk come home?
Will the walk eventually return to the origin? In 1D and 2D, yes — with probability 1. In 3D and higher, no — there is a positive probability the walk wanders off forever and never returns. This is George Pólya's theorem of 1921, and it is one of the cleanest dimension-dependent results in probability.
The proof is a counting argument. Let p_n be the probability the walk is at the origin at step n. The expected number of returns to the origin is Σ p_n. In one dimension p_(2n) ≈ 1/√(πn), so Σ p_(2n) ≈ Σ 1/√n diverges — the walk visits the origin infinitely often. In two dimensions p_(2n) ≈ 1/(πn), so Σ p_(2n) ≈ Σ 1/n is the harmonic series, which also diverges. In three dimensions p_(2n) ≈ const · n^(−3/2), and Σ n^(−3/2) converges — the expected number of returns is finite, which forces the return probability to be strictly less than 1.
The numerical 3D return probability is approximately 0.3405. As one terse summary has it: "a drunkard finds his way home, but a drunken bird does not."
Worked example: drunken walker on 0–10
A walker starts at position 3 on the integer line {0, 1, 2, ..., 10}. At each step she moves ±1 with equal probability. The walk stops when she hits 0 or 10. What is the probability she reaches 10 before 0?
This is the gambler's ruin problem with k = 3 dollars facing a cap of N = 10 in a fair game. The classical result for a fair walk is
P(reach N before 0 | start k) = k / N
So P = 3/10 = 0.3. The expected time to absorption is k(N − k) = 3 · 7 = 21 steps.
For the biased version where each up-step has probability p and each down-step has probability q = 1 − p ≠ 1/2, the formula generalises to
P(reach N before 0) = (1 − (q/p)^k) / (1 − (q/p)^N)
Suppose the walker has a slight bias toward going up: p = 0.55, q = 0.45, so q/p = 0.818. Then
P(reach 10 before 0 | start 3) = (1 − 0.818^3) / (1 − 0.818^10)
= (1 − 0.547) / (1 − 0.137)
= 0.453 / 0.863
≈ 0.525
A 5% per-step edge moves the success probability from 0.30 to 0.52 — a 73% relative improvement. The same formula in reverse explains why even a small house edge makes long sessions at a casino devastating: with p = 0.49, k = 100 dollars, N = 1000, the probability of doubling is well under 1%.
From discrete steps to Brownian motion
Take a random walk with step size Δx every Δt seconds, and squeeze both Δx and Δt to zero. If the squeeze is balanced so that Δx² / Δt = D (the diffusion coefficient) is held constant, the trajectories converge to Brownian motion B(t) — the canonical continuous-time limit of a random walk.
Brownian motion has these characteristic properties:
- B(0) = 0.
- For 0 ≤ s < t, the increment B(t) − B(s) is Normal with mean 0 and variance t − s.
- Increments over disjoint time intervals are independent.
- Paths are almost surely continuous.
- Paths are almost surely nowhere differentiable — they are continuous curves of infinite length on every interval.
The Δx² / Δt = D scaling is the same scaling that appears in the heat equation: u_t = D u_xx. The Laplacian on the right is the spatial second derivative; if you discretise it on a grid, you get the random-walk transition matrix. That is the deep reason diffusion processes and random walks share the same √t spread.
Random walk variants at a glance
| Simple symmetric | Biased | Lazy | Self-avoiding | Lévy flight | |
|---|---|---|---|---|---|
| Step distribution | ±1 with prob 1/2 | +1 with prob p, −1 with q | ±1 or stay, prob 1/3 each | Step ±1 but only into unvisited sites | Heavy-tailed step lengths |
| Mean drift | 0 | p − q | 0 | 0 | 0 if symmetric |
| Variance per step | 1 | 1 − (p−q)² | 2/3 | Bounded | Infinite |
| 1D recurrence | Yes (Pólya) | No (transient) | Yes | Open in d=3, conjectured | Yes if α ≥ 1 |
| Continuum limit | Brownian motion | BM with drift | Slowed BM | SLE in 2D conjectures | α-stable Lévy process |
| Used in | Diffusion, gambler's ruin | Biased polymer, finance | MCMC mixing | Polymers, lattice models | Foraging, animal motion |
The "simple symmetric random walk" is the prototype, but every variant changes one ingredient — adding drift, allowing pauses, forbidding revisits, or making the steps heavy-tailed — and yields a different qualitative behaviour. The same conceptual machinery (Markov chains, generating functions, scaling limits) handles all of them.
Where random walks show up
- Brownian motion of pollen. Robert Brown's 1827 microscope observations of pollen grains in water predate the math by 80 years. Einstein's 1905 paper showed the displacement variance is 2Dt where D = kT/(6πηr) — making each particle a tracer of molecular bombardment and giving a way to measure Avogadro's number.
- Black-Scholes option pricing. The 1973 Black-Scholes model assumes the log of a stock price follows a Brownian motion with drift — geometric Brownian motion. The PDE that the option price satisfies is a heat equation in disguise. The √t scaling shows up directly: at-the-money option prices are proportional to σ√T.
- Google's PageRank. The web is modelled as a directed graph; a random surfer clicks a random outlink with probability 0.85 and teleports with probability 0.15. The PageRank of a page is the surfer's stationary visit frequency, computed as the dominant eigenvector of the transition matrix. The original Brin-Page paper iterated the matrix-vector product to convergence on early Google's billions-of-pages graph.
- Markov chain Monte Carlo (MCMC). Metropolis-Hastings constructs a random walk on the parameter space whose stationary distribution is the posterior. Burn-in equals "wait for the walk to mix"; effective sample size equals "how independent are the walk's snapshots." Modern Bayesian inference (Stan, PyMC, NumPyro) is a stack of clever random walks.
- Polymer chain configurations. A polymer in solution is a chain of monomers; the simple Gaussian chain model treats successive bond orientations as a 3D random walk and predicts that the radius of gyration scales as √N. Self-avoiding walks (no two monomers in the same place) modify the exponent to ν ≈ 0.588 in 3D and √(3/4) in 2D — Flory's celebrated mean-field result.
Random walks as Markov chains
A random walk is a Markov chain: the next position depends only on the current position, not the path that led there. The transition matrix P has entries P(x, y) equal to the probability of stepping from x to y. For the simple symmetric walk on ℤ, P(x, x ± 1) = 1/2 and P(x, y) = 0 otherwise.
This perspective gives access to the full Markov-chain toolkit. Stationary distributions tell you the long-run visit frequency. Mixing times tell you how fast the chain forgets its starting point. Spectral gaps quantify the rate of approach to stationarity. The cover time tells you how long until every site has been visited at least once — for the simple walk on a graph G with n vertices and m edges, the cover time is at most O(mn).
For finite chains the Perron-Frobenius theorem guarantees a unique stationary distribution under irreducibility and aperiodicity. PageRank's "teleportation" ε is precisely the regularisation needed to ensure both: it makes every page reachable from every other page in one step, and breaks any periodicity in the underlying link graph.
Variants and extensions
- Random walk with drift. Add a deterministic μ each step: S_n = nμ + (mean-zero random walk). E[S_n] = nμ; the walk has a definite direction. Used in financial returns where μ is the expected drift; the discrete model converges to Brownian motion with drift in the scaling limit.
- Self-avoiding walk. Step only to unvisited sites. Models polymer chains. Famously hard to analyse exactly; the connective constant for SAW on ℤ² is approximately 2.638..., known only numerically. Conjectured to converge to SLE_(8/3) in 2D — one of Schramm-Loewner evolution's flagship correspondences.
- Lévy flight. Replace bounded steps with heavy-tailed steps having infinite variance. Foraging animals (albatrosses, sharks) have movement statistics consistent with Lévy flights of exponent ≈ 2. The continuous-time limit is an α-stable Lévy process rather than Brownian motion.
- Random walk on a graph. At each step, step to a random neighbour. Generalises ℤ; the stationary distribution is proportional to vertex degree. Cover times, hitting times, and spectral gaps are all studied here. PageRank is a random walk on the directed web graph with teleportation.
- Continuous-time random walk (CTRW). Wait an exponentially-distributed (or heavy-tailed) random time between steps. Models trap models in glassy systems and anomalous diffusion in disordered media. Scaling limits are subdiffusive when the wait-time tail is heavy.
Common pitfalls
- Confusing "drifts back to origin" with "spends most time near origin". The 1D walk visits 0 infinitely often, but it also spends infinite time arbitrarily far away — the proportion of time near the origin tends to zero. Recurrent does not mean "stays near zero".
- Believing the walker has a memory. The "law of averages" — heads are due after a tail run — is wrong. Each step is independent. The reason long-run frequencies converge to 1/2 is the law of large numbers, not any compensating mechanism after the fact.
- Misreading the drift. A walk with even a tiny bias toward one side is transient — it drifts forever. The fair walk is a knife-edge; perturbing p by 0.001 changes asymptotic behaviour qualitatively. Real-world data with even a small persistent edge does not behave like a recurrent walk.
- Diffusion-like vs ballistic confusion. A walk with drift has |S_n| ~ n (ballistic). The fair walk has |S_n| ~ √n (diffusive). The walk in a potential well has |S_n| bounded (localised). All three look similar in short time but diverge in the long-time scaling.
- Hitting-time vs return-time confusion. P(walk ever returns to 0) = 1 in 1D, but the expected return time is infinite — so "return is certain but the wait is unboundedly long". Using "expected return time" as a typical scale produces nonsense.
Frequently asked questions
What is a simple symmetric random walk?
Start at the origin of the integer line. At every step, flip a fair coin: heads move +1, tails move −1. The position after n steps is S_n = X_1 + X_2 + ... + X_n where each X_i takes values ±1 with probability 1/2. The walk has mean E[S_n] = 0 and variance Var[S_n] = n, so the typical distance from the origin grows like √n. This is the canonical model of a discrete diffusion.
What does Pólya's theorem say?
George Pólya proved in 1921 that the simple symmetric random walk returns to its starting point with probability 1 in dimensions 1 and 2, but with probability strictly less than 1 in dimensions 3 and above. In 3D the return probability is about 0.3405, so 3D walks "escape" with probability roughly 0.66. The drunkard finds his way home; the bird does not. The result is sharp and follows from comparing the divergent harmonic series 1 + 1/2 + 1/3 + ... in 1D-2D to a convergent series in 3D+.
How is random walk related to Brownian motion?
Brownian motion is the continuous-time, continuous-space limit of a random walk. If you take a simple random walk with step size Δx and time per step Δt, and rescale so that Δx² / Δt is held constant as both go to zero, the rescaled walk converges (in distribution) to a Brownian motion B(t). This is the functional central limit theorem. Brownian motion has Gaussian increments with variance proportional to time, almost surely continuous paths, and almost surely nowhere-differentiable paths.
What is the gambler's ruin problem?
Start with k dollars, play a game where you win or lose 1 dollar each round, and stop when you hit either 0 (ruin) or N (the cap). For a fair game (probability 1/2 each side), the probability of reaching N before 0 is k/N. For a biased game with win probability p ≠ 1/2, the probability is (1 − (q/p)^k) / (1 − (q/p)^N) where q = 1 − p. The takeaway: any small house edge compounds into near-certain ruin if you play long enough.
How does Google's PageRank use a random walk?
PageRank models a web surfer who clicks a random outlink with probability 1 − d (typically d = 0.15) and teleports to a random page with probability d. The PageRank of a page is the long-run fraction of time this surfer spends there — the stationary distribution of a Markov chain on the web graph. Computing it reduces to finding the dominant eigenvector of a stochastic matrix, traditionally via power iteration. The teleportation term ensures the chain is irreducible and aperiodic so the limit exists.
What does "typical distance √n after n steps" actually mean?
The expected position E[S_n] = 0 by symmetry, but the typical magnitude is given by the standard deviation √(Var[S_n]) = √n. After 10,000 fair coin flips a one-step-per-flip walker is on average about 100 steps from the start — not 0, and not 10,000. This √n scaling is the signature of diffusion and is why diffusion times scale as length squared rather than length: a particle takes a million times longer to diffuse a thousand times further.