Partial Differential Equations

Heat Equation

∂u/∂t = α∇²u — the canonical parabolic PDE behind every diffusion process

The heat equation ∂u/∂t = α∇²u is the canonical linear parabolic PDE — it governs heat diffusion, Brownian motion, image blur, and chemical mixing. The fundamental solution is a Gaussian that spreads with width √(αt) — the universal diffusion scaling.

Equation∂u/∂t = α∇²u
Fundamental solutionGaussian (4παt)^(−n/2) exp(−|x|²/(4αt))
Diffusivityα = k/(ρ c_p), units m²/s
Copper α≈ 1.1 × 10⁻⁴ m²/s
ScalingWidth grows as √(αt)
ClassificationLinear parabolic; irreversible, dissipative

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The equation

Joseph Fourier introduced the heat equation in his 1822 treatise Théorie analytique de la chaleur:

∂u/∂t = α ∇²u

Here u(x, t) is the temperature at point x and time t, ∇² is the Laplacian in space, and α (units: m²/s) is the thermal diffusivity of the material. The equation says: the rate at which temperature changes in time is proportional to the Laplacian — the local curvature of the temperature field. Where ∇²u > 0 (the point is colder than its neighbours), the point warms up. Where ∇²u < 0 (the point is a local hot spot), it cools.

The equation is linear (superposition holds), first-order in time, second-order in space, and parabolic. It is the prototype of all diffusion: anything that smooths out by exchanging with its neighbours obeys this equation in its simplest form.

Where the equation comes from

Two physical inputs combine. First, Fourier's law of heat conduction: heat flux is proportional to the negative gradient of temperature, q = −k ∇u, where k is the thermal conductivity (W/(m·K)). Heat flows from hot to cold, with rate proportional to the temperature gradient.

Second, energy conservation: the rate of change of thermal energy in a region equals the net inflow of heat through its boundary. By the divergence theorem, this is

ρ c_p ∂u/∂t = − ∇·q = ∇·(k ∇u) = k ∇²u   (if k constant)

where ρ is density (kg/m³) and c_p is specific heat (J/(kg·K)). Dividing by ρc_p:

∂u/∂t = (k / ρ c_p) ∇²u = α ∇²u

That is the heat equation. The diffusivity α = k/(ρc_p) is the only material property that matters — high-conductivity, low-mass-density, low-heat-capacity materials diffuse heat fastest.

The fundamental solution — Gaussian heat kernel

The most important explicit solution is the response to a unit-mass spike at the origin at time zero. In n spatial dimensions, it is the heat kernel:

K(x, t) = (4π α t)^(−n/2) exp( −|x|² / (4 α t) )    for t > 0

K(x, t) is a Gaussian centred at x = 0 with standard deviation σ(t) = √(2αt). At t = 0 it is the Dirac delta (a localised spike); as t increases, it widens like √t, the universal diffusion scaling.

Two key properties:

Mass conservation. ∫ K(x, t) dⁿx = 1 for all t > 0. Heat does not appear or disappear, just spreads.
Self-similarity. K(x, t) = t^(−n/2) K(x/√t, 1) — the shape at all times is the same, only the scale changes. This is the diffusion analogue of Galilean invariance.

By linearity, the solution for any initial condition u(x, 0) = u₀(x) is the convolution

u(x, t) = ∫ K(x − y, t) u₀(y) dⁿy

Every heat diffusion problem on infinite space is a Gaussian convolution. Applying the heat kernel to an image is Gaussian blur. Applying it to a probability density gives a smoother density. The heat kernel is the "blur operator" of mathematics.

Worked example — heat penetration in copper

Copper has thermal conductivity k ≈ 401 W/(m·K), density ρ ≈ 8960 kg/m³, and specific heat c_p ≈ 385 J/(kg·K). Compute its thermal diffusivity:

α = k / (ρ c_p) = 401 / (8960 · 385) ≈ 1.16 × 10⁻⁴ m²/s

That is roughly 1.16 cm²/s. Suppose a long copper rod is held at temperature 0 °C, and at t = 0 you suddenly heat one end (x = 0) to 100 °C. The temperature inside the rod evolves according to the 1D heat equation. The well-known solution is

u(x, t) = 100 · erfc( x / (2 √(α t)) )

where erfc is the complementary error function. The temperature reaches half of its boundary value (50 °C) at the depth where x/(2√(αt)) ≈ 0.477 (since erfc(0.477) ≈ 0.5):

x_half ≈ 0.954 · √(α t)

At t = 60 s: x_half ≈ 0.954 · √(1.16 × 10⁻⁴ · 60) ≈ 0.954 · 0.083 ≈ 0.079 m ≈ 7.9 cm.

After 60 seconds, the 50 °C front has penetrated about 8 cm into the copper. The √t scaling means doubling the depth takes four times the time, not twice — diffusion is patient.

Diffusivity of common materials

Material	α (m²/s)	Penetration depth at t = 60 s
Copper	1.16 × 10⁻⁴	≈ 8 cm
Aluminium	9.7 × 10⁻⁵	≈ 7.3 cm
Iron (steel)	2.3 × 10⁻⁵	≈ 3.5 cm
Stainless steel (316)	4.0 × 10⁻⁶	≈ 1.5 cm
Glass	3.4 × 10⁻⁷	≈ 4.3 mm
Water	1.4 × 10⁻⁷	≈ 2.8 mm
Air (20°C, 1 atm)	2.2 × 10⁻⁵	≈ 3.5 cm
Wood (oak)	1.0 × 10⁻⁷	≈ 2.3 mm

The eight-orders-of-magnitude spread (copper to wood) is what makes copper saucepans cook evenly and wood spoons stay cool to the touch — same equation, very different α.

The same equation, many physical incarnations

The heat equation is named for thermal conduction but governs many diffusive processes:

Brownian motion. Einstein (1905) showed that random walk in the limit of small steps gives ∂p/∂t = D ∇²p for the probability density of a particle's location. The Wiener process has variance E[W(t)²] = t — the √t scaling of diffusion. This linked thermodynamics, statistical mechanics, and the existence of atoms.
Fick's law of diffusion. The concentration C of a solute satisfies ∂C/∂t = D ∇²C, where D is the molecular diffusivity. Drug release, dye spreading, gas mixing — all heat equations.
Image blur. Apply the heat kernel to image brightness for time t to get Gaussian blur with σ = √(2αt). Computer vision's "scale-space" theory builds on the heat equation as the canonical scale parameter.
Population spread. Reaction-diffusion equations of the form ∂u/∂t = D ∇²u + f(u) describe spreading populations (Fisher–KPP equation), chemical reactions (Belousov–Zhabotinsky), and pattern formation (Turing).
Finance. The Black–Scholes equation for option pricing is a heat equation in disguise — after a change of variables, ∂V/∂t = (σ²S²/2) ∂²V/∂S² + ... maps to the standard heat equation. Diffusion underlies modern financial mathematics.
Optimal transport and machine learning. Gradient flows on probability measures (Jordan–Kinderlehrer–Otto 1998) are heat equations on the Wasserstein manifold. Diffusion models in image generation (DDPM 2020) train neural networks to invert the heat equation.

Maximum principle and irreversibility

The heat equation has a maximum principle: on a bounded domain Ω × [0, T] with parabolic boundary (the side walls plus the t = 0 face), the maximum of u occurs on the parabolic boundary, not at later interior times. As a corollary, you cannot create a hot spot inside a region just by heat diffusion — only the boundary or the initial condition can introduce maxima. Heat smooths.

The maximum principle implies irreversibility: the heat equation is well-posed forwards in time (smooths, decays gracefully) but ill-posed backwards (recovering the initial state from a later one is unstable). This is the mathematical incarnation of the second law of thermodynamics — heat flows from hot to cold, and the inverse process needs more information than the later state provides.

Contrast with the wave equation, which is reversible: running u_tt = c²∇²u backwards is the same equation, and energy is conserved. Hyperbolic equations are time-symmetric; parabolic equations are not.

Numerical solution

The simplest explicit scheme (forward Euler in time, central differences in space):

u_{i, n+1} = u_{i, n} + (α Δt / Δx²) (u_{i+1, n} − 2 u_{i, n} + u_{i−1, n})

Stability requires α Δt / Δx² ≤ 1/2 (von Neumann analysis). This is the parabolic CFL — different from the hyperbolic CFL, and severely restrictive on fine grids since you must take Δt ∝ Δx², not Δt ∝ Δx. Halve the grid spacing and your time step shrinks by 4×, your total cost by 8× in 1D and 32× in 3D.

Implicit schemes (backward Euler, Crank–Nicolson) avoid this restriction at the cost of a linear solve at each step. Crank–Nicolson is second-order accurate in time and unconditionally stable — the standard choice for many practical diffusion solvers.

	Heat (parabolic)	Wave (hyperbolic)	Laplace (elliptic)
Equation	u_t = α∇²u	u_tt = c²∇²u	∇²u = 0
Time order	First	Second	None (static)
Initial conditions	u(x, 0)	u(x, 0) and u_t(x, 0)	None (boundary only)
Propagation speed	Infinite (Gaussian tail)	Finite (= c)	Instantaneous (static)
Reversibility	Irreversible	Reversible	—
Energy	Dissipated	Conserved	—
Fundamental solution	Gaussian (4παt)^(−n/2) exp(−r²/4αt)	Delta on light cone (3D)	1/(4πr) (3D)
Steady state	Laplace ∇²u = 0	Laplace ∇²u = 0	(is its own steady state)

The three equations together cover most of the linear-PDE landscape: hyperbolic (wave-like, reversible), parabolic (diffusive, irreversible), elliptic (equilibrium, time-independent). Different physics, but a unified mathematical taxonomy.

Common mistakes

Confusing heat with wave. u_t = ∇²u is fundamentally different from u_tt = ∇²u. The first time derivative versus the second changes irreversibility, propagation speed, conservation, characteristics — every important property. Picking the wrong PDE for your physical problem is a common modelling error.
Forgetting the diffusivity α. Setting α = 1 in calculations is common in textbook formulations, but in physical problems α matters: different materials diffuse at vastly different rates. Mistaking dimensions causes wrong answers by factors of thousands.
Violating the parabolic CFL. α Δt / Δx² ≤ 1/2 is non-negotiable for explicit schemes. The Δx² makes this restriction severe — refining the grid quadratically slows time steps.
Trying to solve backwards in time. The heat equation is ill-posed in reverse. Naïvely running an explicit scheme with negative time step blows up exponentially. Inverse heat problems require regularisation (Tikhonov, Bayesian priors) and stable algorithms.
Missing the √t scaling. Diffusion penetrates as √t, not linearly. Doubling the depth takes four times the time, not twice. This catches people designing thermal protection or estimating reaction completion times.
Confusing diffusivity α with conductivity k. α = k/(ρc_p). Two materials can have the same k but very different α if their volumetric heat capacities ρc_p differ. Water and steel have similar k but vastly different α — water stores heat, steel transmits it.
Using free-space heat kernel on bounded domains. The Gaussian K(x, t) = (4παt)^(−n/2) exp(−|x|²/(4αt)) is the response on infinite space. On bounded domains with insulated walls (Neumann) or fixed-temperature walls (Dirichlet), the kernel changes — use the method of images or eigenfunction expansion.

Frequently asked questions

What is the fundamental solution (heat kernel) of the heat equation?

In n spatial dimensions, the fundamental solution is the Gaussian K(x, t) = (4παt)^(−n/2) exp(−|x|²/(4αt)) for t > 0. It is the solution with a unit-mass spike at the origin at t = 0: K(x, 0) = δⁿ(x). Convolving against any initial condition u(x, 0) = u₀(x) gives the solution for all later times: u(x, t) = ∫ K(x − y, t) u₀(y) dⁿy. The Gaussian widens as √(αt), the canonical 'diffusion scaling' — the mean-squared displacement of a Brownian particle, the standard deviation of a heat distribution from a point source, and the width of a Gaussian blur all scale as the square root of time.

What does the diffusion coefficient α represent physically?

α = k/(ρ c_p), where k is the thermal conductivity, ρ is the mass density, and c_p is the specific heat capacity at constant pressure. Units are m²/s. Numerical values: copper α ≈ 1.1 × 10⁻⁴ m²/s; aluminium α ≈ 9.7 × 10⁻⁵; steel ≈ 4 × 10⁻⁶; water ≈ 1.4 × 10⁻⁷; air ≈ 2.2 × 10⁻⁵. The √(αt) scaling tells you how far heat penetrates in time t. A copper rod heated at one end has the temperature pulse reach 10 cm after about (0.1)²/(1.1·10⁻⁴) ≈ 91 seconds. Aluminium is similar; steel takes 25× longer for the same depth.

How is the heat equation related to Brownian motion?

Einstein (1905) derived the diffusion equation as the macroscopic limit of random walks. If a particle takes steps of size Δx in time Δt with mean zero and variance proportional to Δt, the probability density p(x, t) of finding the particle at x at time t satisfies ∂p/∂t = (D/2) ∇²p, with D = Δx²/Δt the diffusion constant. So the heat equation and the Fokker–Planck equation for Brownian motion are the same PDE. The Wiener process W(t) — Brownian motion — has E[W(t)²] = t, matching the √t diffusion scaling exactly.

Why is the heat equation called parabolic?

Second-order linear PDEs classify by the discriminant b² − ac of their principal symbol. For ∂u/∂t − α ∂²u/∂x² = 0, written in (x, t) variables, the principal symbol is degenerate in t — the coefficient of u_{tt} is zero, putting the discriminant at the boundary between elliptic (Laplace) and hyperbolic (wave). Parabolic equations have infinite propagation speed (any change in initial data is felt instantly everywhere, though the magnitude decays Gaussianly), are irreversible in time (the maximum principle implies smoothing forward, no recovery backward), and are dissipative.

What is the maximum principle for the heat equation?

On a cylinder Ω × [0, T] with parabolic boundary (initial time t = 0 plus the side walls ∂Ω × [0, T]), a solution u of the heat equation attains its maximum on the parabolic boundary — not in the interior or on the top. As a corollary, two solutions with the same initial and boundary data are equal (uniqueness), and the heat equation cannot create hot spots in the interior — only the initial condition or the boundaries can. The principle also gives a comparison theorem: if u₁ ≥ u₂ on the parabolic boundary, then u₁ ≥ u₂ everywhere, a tool used to prove convergence of numerical schemes.

Is the heat equation reversible in time?

No. The heat equation is irreversible: it smooths data forward in time but blows up backward. Trying to recover the initial state from a known later-time state is ill-posed — small perturbations in the data become unboundedly large in the inverse. This makes the heat equation a model of dissipation and the arrow of time, unlike the wave equation which is reversible (running time backward in u_tt = c² ∇²u leaves the equation unchanged). In information-theoretic language: forward heat-flow loses information (entropy of the temperature distribution increases), and that loss cannot be reversed.

How is the heat equation used in image processing?

Applying the heat equation to a 2D image — treating brightness as u(x, y, t) — is Gaussian blur, with blur kernel σ = √(2αt). Larger t = more blur. This is the basic 'low-pass filter' in computer vision and image processing: convolving with a Gaussian smooths out noise and small-scale features. Modern variants include anisotropic diffusion (Perona–Malik 1990, which slows diffusion across strong edges to preserve them), the scale-space theory of computer vision, and connections to deep learning (diffusion models for image generation run the heat equation forward as noise injection, then learn to invert).