Statistical Mechanics

Fokker-Planck Equation

The equation that turns a single unpredictable jiggle into a forecastable cloud of probability

The PDE for how a probability distribution evolves under drift plus diffusion — governing Brownian motion, diffusion, and stochastic finance.

Equation (1D)dP/dt = -d(A·P)/dx + d²(B·P)/dx²
Two forcesDrift A pulls, diffusion B spreads
Steady stateP_ss ~ e^(-U/D) (Boltzmann profile)
Free diffusionVariance grows linearly: σ² = 2Dt
ConservationContinuity form keeps ∫P dx = 1 exactly
Also known asKolmogorov forward / Smoluchowski equation

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Definition

The Fokker-Planck equation tracks the entire ensemble of a random process at once. Instead of following one noisy trajectory, it follows the probability distribution P(x, t) — the odds of finding the system at position x at time t. In one dimension:

∂P/∂t = -∂/∂x [ A(x)·P ]  +  ∂²/∂x² [ B(x)·P ]

Two terms, two competing physical processes:

Drift — the first-derivative term with the drift coefficient A(x). It is deterministic: it slides the whole distribution toward lower potential, like a ball rolling downhill.
Diffusion — the second-derivative term with the diffusion coefficient B(x). It is random: it flattens and widens the distribution, like a drop of ink spreading in water.

The hard, beautiful fact is that these two effects do not cancel — they reach a truce. The drift pulls inward, the diffusion pushes outward, and the distribution settles into a fixed, unchanging steady-state shape. That shape is the Boltzmann distribution.

How it works — drift versus diffusion

Picture a single dust mote in a fluid, bombarded by molecules. Each collision nudges it randomly. If we watched a thousand identical motes released from the same point, after a moment they would form a spreading bell curve. The Fokker-Planck equation is the rule that bell curve obeys.

Pure diffusion. Set the drift to zero and hold B = D constant. The equation collapses to the heat equation:

∂P/∂t = D·∂²P/∂x²

Start with all probability at a single point (a Dirac delta). The solution is a Gaussian that keeps its area but widens forever, with variance

σ²(t) = 2·D·t

That linear-in-time spread — not the position, but the variance growing as t — is the precise fingerprint of Brownian motion that Einstein used in 1905 to prove atoms exist.

Add drift. Now let A(x) = -dU/dx, a force from a potential well U(x) shaped like a bowl. The peak of the distribution slides downhill toward the bottom of the bowl while it spreads. Drift recenters; diffusion widens. The race between them sets the final width.

The truce — steady state. Eventually the inward pull of drift exactly balances the outward push of diffusion. The probability current J = A·P − ∂(B·P)/∂x drops to zero everywhere, ∂P/∂t = 0, and the distribution freezes into:

P_ss(x) ∝ e^(-U(x)/D)

Steeper well, narrower peak. Hotter system (larger D), broader peak. This is the Boltzmann distribution, and the Fokker-Planck equation is the dynamics that gets you there from any starting point.

A worked example — relaxing into a harmonic well

Take the most important solvable case: the Ornstein-Uhlenbeck process. The potential is a parabola U(x) = ½k·x², so the drift is the linear restoring force A(x) = -k·x, with constant diffusion D. This models a bead on an optical-tweezers spring, a damped voltage in an RC circuit, or a mean-reverting interest rate.

The equation is

∂P/∂t = ∂/∂x [ k·x·P ] + D·∂²P/∂x²

Start the bead off-center at x₀ = 4 with zero spread. Plug in concrete numbers: spring constant k = 1 s⁻¹, diffusion D = 0.5. The distribution stays Gaussian for all time, with a mean that decays exponentially and a variance that saturates:

mean(t)     = x₀·e^(-k·t)          = 4·e^(-t)
variance(t) = (D/k)·(1 - e^(-2k·t)) → D/k = 0.5

Read that off as a timeline:

t = 0: a spike at x = 4, variance 0.
t = 1: mean has slid to 4·e⁻¹ ≈ 1.47; variance has grown to 0.5·(1 − e⁻²) ≈ 0.43.
t = 3: mean ≈ 0.20, variance ≈ 0.499 — essentially settled.
t → ∞: mean = 0, variance = D/k = 0.5. The steady state is a Gaussian P_ss ∝ e^(-x²/(2·0.5)) = e^(-k·x²/(2D)), exactly the Boltzmann form e^(-U/D).

The relaxation time is τ = 1/k = 1 second; after a few τ the system has forgotten where it started. The whole approach to thermal equilibrium, quantified, in two closed-form lines.

Variants and regimes

Form / name	What it adds	When to use
Heat / diffusion equation	Drift = 0, B = D constant	Free Brownian motion, ink in water, heat spread
Smoluchowski equation	Overdamped: position only, friction dominates inertia	Colloids, polymers, protein folding at micron scale
Kramers equation	Keeps both position and velocity (phase space)	Underdamped escape over barriers, reaction-rate theory
Ornstein-Uhlenbeck	Linear drift A = -k·x, constant D	Optical traps, mean-reverting finance, RC noise
Kolmogorov forward equation	Identical math, probabilist's language	Option pricing, queueing, population genetics
Nonlinear / McKean-Vlasov	Drift depends on P itself	Interacting particles, swarms, plasma, mean-field games

From a single trajectory to the distribution

Where does the equation come from? Start with the Langevin equation for one particle — a deterministic drift plus a random kick:

dx = A(x)·dt + √(2B(x))·dW

Here dW is a Wiener increment: Gaussian noise with zero mean and variance dt. Run that stochastic differential equation millions of times and histogram the endpoints. The Fokker-Planck equation is the exact PDE that histogram obeys — derived by a Kramers-Moyal expansion of the underlying Markov process, truncated at second order (mean = drift, variance = diffusion, higher moments dropped).

This is the crucial conceptual move: one equation for the cloud replaces a billion equations for the droplets. The Langevin picture is microscopic and noisy; the Fokker-Planck picture is macroscopic and smooth. They describe the same physics.

JavaScript — solving Fokker-Planck on a grid

// 1D Fokker-Planck: dP/dt = -d(A·P)/dx + D·d²P/dx²
// Conservative finite-volume scheme (keeps total probability ≈ 1).
function fokkerPlanckStep(P, x, dx, dt, drift, D) {
  const n = P.length;
  const flux = new Array(n + 1).fill(0); // flux at cell edges
  // Probability current J = A·P - D·dP/dx, evaluated on edges
  for (let i = 1; i < n; i++) {
    const xEdge = 0.5 * (x[i - 1] + x[i]);
    const A = drift(xEdge);
    const pUpwind = A > 0 ? P[i - 1] : P[i];   // upwind drift
    flux[i] = A * pUpwind - D * (P[i] - P[i - 1]) / dx;
  }
  // flux[0] = flux[n] = 0  → reflecting (no-flux) walls conserve probability
  const Pnew = new Array(n);
  for (let i = 0; i < n; i++) {
    Pnew[i] = P[i] - (dt / dx) * (flux[i + 1] - flux[i]);
  }
  return Pnew;
}

// Harmonic well: A(x) = -k·x  → steady state ∝ e^(-k x² / 2D) (Boltzmann)
const k = 1, D = 0.5;
const N = 201, L = 6, dx = (2 * L) / (N - 1), dt = 0.0005;
const x = Array.from({ length: N }, (_, i) => -L + i * dx);

// Start as a narrow packet off-center at x ≈ 4
let P = x.map(xi => Math.exp(-Math.pow(xi - 4, 2) / 0.1));
const norm = a => { const s = a.reduce((t, v) => t + v, 0) * dx; return a.map(v => v / s); };
P = norm(P);

const mean = a => a.reduce((t, v, i) => t + x[i] * v, 0) * dx;
const variance = a => { const m = mean(a); return a.reduce((t, v, i) => t + (x[i]-m)**2 * v, 0) * dx; };

for (let step = 0; step <= 6000; step++) {
  if (step % 2000 === 0) {
    const t = (step * dt).toFixed(2);
    console.log(`t=${t}  mean=${mean(P).toFixed(3)}  var=${variance(P).toFixed(3)}`);
  }
  P = fokkerPlanckStep(P, x, dx, dt, xi => -k * xi, D);
}
// t=0.00  mean=4.000  var=0.050
// t=1.00  mean≈1.47   var≈0.43
// t=3.00  mean≈0.20   var≈0.50   → settles at D/k = 0.5 (Boltzmann width)

Performance and stability analysis

The naive explicit scheme above is cheap — O(N) per time step, O(N·T/dt) total — but it is only conditionally stable. The diffusion term imposes the classic parabolic CFL limit:

dt ≤ dx² / (2D)

Halving the grid spacing dx to double spatial resolution forces a fourfold cut in dt — the curse of explicit diffusion solvers. With N = 201 and D = 0.5 the bound is dt ≲ 9×10⁻⁴, which is why the code uses dt = 5×10⁻⁴. Push past it and the solution explodes into checkerboard oscillations.

Method	Cost / step	Stability	Best for
Explicit finite-volume	O(N)	dt ≤ dx²/(2D)	Quick demos, short times
Crank-Nicolson (implicit)	O(N) tridiagonal solve	Unconditionally stable	Long-time relaxation, stiff D
Spectral (FFT)	O(N log N)	Excellent for smooth P	Periodic or constant-coefficient
Monte Carlo (Langevin SDE)	O(M) particles	No grid; statistical noise ~1/√M	High dimensions (d > 3)
Analytic (OU, free)	O(1)	Exact	Linear drift, Gaussian P

The deep reason high-dimensional problems abandon the grid: a 3D position-velocity Fokker-Planck lives in a 6D phase space, and a grid with M points per axis needs M⁶ cells. At M = 100 that is a trillion cells. This is the curse of dimensionality, and it is exactly why finance and chemistry simulate the Langevin SDE with sampled particles instead — Monte Carlo error falls only as 1/√M, but it is blissfully dimension-independent.

Where the Fokker-Planck equation shows up

Brownian motion and colloids. Pollen grains, nanoparticles, optical-tweezers beads — anything jiggling in a fluid. The Smoluchowski form predicts their position statistics.
Stochastic finance. The Kolmogorov forward equation gives the probability density of future asset prices; Black-Scholes option pricing is its backward twin.
Laser and electronic noise. Phase diffusion in lasers, Johnson-Nyquist noise in circuits, and the linewidth of oscillators.
Chemical reaction rates. Kramers' escape-over-a-barrier theory — the rate a molecule hops out of a potential well — is a Fokker-Planck calculation.
Population genetics. The Wright-Fisher diffusion of allele frequencies obeys a Fokker-Planck (Kolmogorov) equation.
Neuroscience and machine learning. Integrate-and-fire neuron populations and the noise dynamics of stochastic gradient descent are analyzed with Fokker-Planck.
Plasma and astrophysics. Velocity-space relaxation of charged particles and star clusters via the Fokker-Planck collision term.

Common mistakes and misconceptions

Pulling B outside the derivative. The diffusion term is ∂²(B·P)/∂x², not B·∂²P/∂x². They agree only when B is constant. For space-dependent diffusion, the placement matters and there is an Itô-vs-Stratonovich ambiguity that changes the answer.
Thinking drift always wins. Drift does not collapse the distribution to a point. Diffusion fights back; the steady state has finite width D/k, not zero. A real thermal system is never perfectly localized.
Confusing the variable with the spread. In free diffusion the position wanders unpredictably, but the variance grows deterministically as 2Dt. The randomness is in the trajectory, not in the distribution's evolution.
Forgetting it's a continuity equation. Probability is conserved: ∂P/∂t + ∂J/∂x = 0. If your numerical scheme loses normalization, it is wrong — use a conservative flux form with no-flux walls.
Applying it to large or rare jumps. Fokker-Planck assumes small, frequent, near-Gaussian kicks. For shot noise, Lévy flights, or chemistry with few molecules, the Kramers-Moyal truncation breaks and you need the full master equation.
Ignoring the stability bound. The explicit solver needs dt ≤ dx²/(2D). Skip it and your beautiful bell curve dissolves into numerical garbage within a few steps.

Frequently asked questions

What does the Fokker-Planck equation describe?

It describes how the probability distribution P(x,t) of a random variable evolves in time. Two effects compete: drift, the deterministic A(x) term that pushes probability toward lower potential, and diffusion, the random B(x) term that spreads it out. The compact one-dimensional form is dP/dt = -d(A·P)/dx + d²(B·P)/dx². It is the deterministic "sister" equation to a stochastic differential equation — instead of tracking one noisy trajectory, it tracks the whole ensemble of trajectories at once.

What is the steady-state solution of the Fokker-Planck equation?

When drift and diffusion balance, the time derivative vanishes and the probability current is zero everywhere. For constant diffusion D and a drift derived from a potential U(x), the stationary solution is P_ss(x) ~ e^(-U(x)/D). This is exactly the Boltzmann distribution from statistical mechanics, with the diffusion constant playing the role of temperature (D = k_B·T/γ in the Einstein relation). So the Fokker-Planck equation is the dynamical bridge that explains how a system relaxes toward thermal equilibrium.

How is the Fokker-Planck equation related to Brownian motion?

A single Brownian particle follows a Langevin stochastic differential equation: dx = A(x)dt + sqrt(2B)·dW, where dW is a Wiener (white-noise) increment. The Fokker-Planck equation is the corresponding equation for the probability density of that particle's position. For pure diffusion with no drift, it reduces to the heat equation dP/dt = D·d²P/dx², whose solution is a Gaussian whose variance grows linearly in time: σ² = 2Dt. That linear spread is the mathematical signature of Brownian motion that Einstein derived in 1905.

What is the difference between the Fokker-Planck and the master equation?

The master equation describes a general Markov process with arbitrary jump sizes. The Fokker-Planck equation is the continuous, small-jump limit obtained by a Kramers-Moyal expansion truncated at second order — keeping only the drift (first moment) and diffusion (second moment) terms. If jumps are large or rare (shot noise, chemical reactions with few molecules), the truncation fails and you must keep the full master equation. Fokker-Planck assumes the noise is effectively Gaussian and continuous.

Why is the Fokker-Planck equation used in finance?

Asset prices are modeled as stochastic processes — geometric Brownian motion in the Black-Scholes model. The Fokker-Planck equation gives the time-evolving probability density of future prices, letting you compute the odds an option finishes in the money. The closely related Kolmogorov forward equation (its exact mathematical twin) underpins option pricing, the Black-Scholes PDE arising as the backward companion. Fokker-Planck turns a single unpredictable price path into a forecastable probability cloud.

What is the Smoluchowski equation?

The Smoluchowski equation is the overdamped limit of the Fokker-Planck equation, valid when friction dominates inertia (heavy particle in viscous fluid). It evolves position alone: dP/dt = (1/γ)·d/dx[(dU/dx)·P] + D·d²P/dx². The full Kramers equation keeps both position and velocity. Most colloidal, polymer, and protein-folding problems live in the overdamped Smoluchowski regime, because at micron scales inertia is utterly negligible compared to drag.

Does the Fokker-Planck equation conserve probability?

Yes. It can be written as a continuity equation dP/dt + dJ/dx = 0, where J is the probability current J = A·P - d(B·P)/dx. Because the whole right-hand side is a spatial derivative, integrating over all x makes it telescope to zero (with no-flux or vanishing boundaries), so the total probability stays exactly 1 for all time. This is the formal guarantee that the distribution stays a valid distribution — it never gains or loses normalization, it only reshapes.