Numerical Analysis

Finite Difference Method

Approximate derivatives by discrete differences — the foundation of every PDE numerical solver and every numerical derivative your code has computed

The derivative is a limit. The finite difference method replaces the limit with a finite step h on a grid. Forward, backward, and central differences give first- or second-order approximations to f'; symmetric stencils give f''. Plug these into a partial differential equation and the continuous problem becomes a linear system on a grid — the foundation of every numerical PDE solver.

Forward difference(f(x+h) − f(x))/h, O(h)
Central difference(f(x+h) − f(x−h))/(2h), O(h²)
Second derivative(f(x+h) − 2f(x) + f(x−h))/h², O(h²)
5-point stencil for f'O(h⁴)
CFL conditionstability bound on Δt/h
Used inCFD, seismic, weather, finance

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

From limits to differences

The derivative is defined as a limit:

f'(x) = lim_{h → 0} (f(x + h) − f(x)) / h

Computers cannot take limits. They can compute (f(x + h) − f(x))/h for some specific small h. That single substitution — finite h instead of vanishing h — turns calculus into algebra and produces the entire family of finite-difference operators. The price is approximation error proportional to a power of h; the benefit is that derivatives become computable from function values on any grid.

Three first-order one-sided variants and one second-order symmetric variant are the workhorse stencils:

forward:     D⁺ f(x) = (f(x + h) − f(x)) / h            error O(h)
backward:    D⁻ f(x) = (f(x) − f(x − h)) / h            error O(h)
central:     D⁰ f(x) = (f(x + h) − f(x − h)) / (2h)     error O(h²)
2nd-deriv:   D² f(x) = (f(x + h) − 2 f(x) + f(x − h)) / h²    error O(h²)

The central difference uses two function evaluations and reaches twice the order of accuracy of the forward or backward difference, which use the same two evaluations. Symmetry buys an order for free — by canceling the even-order Taylor terms.

Why central differences are O(h²)

Taylor-expand f around x:

f(x + h) = f(x) + h·f'(x) + (h²/2)·f''(x) + (h³/6)·f'''(x) + O(h⁴)
f(x − h) = f(x) − h·f'(x) + (h²/2)·f''(x) − (h³/6)·f'''(x) + O(h⁴)

Subtracting the second from the first:

f(x + h) − f(x − h) = 2h·f'(x) + (h³/3)·f'''(x) + O(h⁵)

Divide by 2h:

(f(x + h) − f(x − h)) / (2h) = f'(x) + (h²/6)·f'''(x) + O(h⁴)

The leading error is h²·f'''(x)/6 — second-order. The same trick applied to f(x + h) + f(x − h) cancels the odd terms, giving the second-derivative stencil with O(h²) error. Higher-order stencils combine more points to cancel more Taylor terms: a 4-point stencil for f' reaches O(h⁴), a 6-point reaches O(h⁶), and so on. The trade is wider stencil width versus more accuracy per grid point.

Worked example: numerical derivative of sin(x)

The true value of f'(1) = cos(1) ≈ 0.5403023058681398. Compute the forward, backward, and central difference approximations at various h:

h        forward                  central                error (fwd)    error (ctr)
─        ──────────────────────   ────────────────────   ────────────   ────────────
0.1      0.4973637525...           0.5394979137...         4.30 × 10⁻²    8.04 × 10⁻⁴
0.01     0.5360860840...           0.5402932061...         4.22 × 10⁻³    9.10 × 10⁻⁶
0.001    0.5398814803...           0.5403022149...         4.21 × 10⁻⁴    9.09 × 10⁻⁸
1e-4     0.5402601419...           0.5403023049...         4.21 × 10⁻⁵    9.09 × 10⁻¹⁰
1e-5     0.5402980738...           0.5403023058...         4.23 × 10⁻⁶    1.4  × 10⁻¹¹
1e-6     0.5403018598...           0.5403023060...         4.5  × 10⁻⁷    1.8  × 10⁻¹⁰  ← round-off
1e-7     0.5403022614...           0.5403023060...         4.4  × 10⁻⁸    1.8  × 10⁻¹⁰
1e-9     0.5403023056...           0.5403023060...         too small      noise floor
1e-12    0.5402850090...           0.5403210731...         too small      catastrophic

Two patterns are visible. First, the truncation error scales exactly as predicted: forward shrinks linearly with h (4.3 × 10⁻² → 4.2 × 10⁻³ → 4.2 × 10⁻⁴ as h drops by 10), central shrinks quadratically (8 × 10⁻⁴ → 9 × 10⁻⁶ → 9 × 10⁻⁸). Second, beyond a certain small h, round-off in computing f(x+h) − f(x) starts to dominate — the difference of nearly equal numbers loses significant digits. Optimal h for the forward difference in double precision is around √ε ≈ 1.5 × 10⁻⁸ where the truncation and round-off errors cross; for central difference the optimum is around ε^{1/3} ≈ 6 × 10⁻⁶.

For high-precision needs, use the complex-step derivative f'(x) ≈ Im(f(x + ih))/h, which avoids the subtraction and gives 14-digit accuracy at h = 10⁻¹⁰ — or use automatic differentiation, which sidesteps the issue entirely.

Order-of-accuracy comparison

Stencil	Approximates	Points	Order	Leading error	Use case
Forward / backward	f'	2	1	h·f''/2	Time-stepping start/end
Central	f'	2	2	h²·f'''/6	Interior nodes, default
5-point centred	f'	4	4	h⁴·f⁽⁵⁾/30	High-accuracy CFD
Central second	f''	3	2	h²·f⁽⁴⁾/12	Laplacian, default
5-point second	f''	5	4	h⁴·f⁽⁶⁾/90	Spectral PDE solvers
Upwind first-order	f'	2 (biased)	1	h·f''/2	Advection (conservative)
Compact (Padé)	f'	3 (implicit)	4	h⁴·f⁽⁵⁾/30	Direct numerical simulation
Spectral / FFT	f' or f''	all	exponential	e^{−cN}	Smooth periodic fields

Each row trades width for accuracy. The central stencil at three points is the default in scientific computing — it covers most use cases at minimum implementation complexity. Compact and spectral schemes appear in production CFD and weather codes where every digit of accuracy reduces the grid count quadratically.

Discretising a PDE

The 1D heat equation u_t = α u_xx on x ∈ [0, L], t ≥ 0, with u(0, t) = u(L, t) = 0 and initial u(x, 0) = u₀(x), is the canonical example. Discretise space with N + 1 nodes x_i = i·h (h = L/N) and time with timestep Δt. Let u_i^n ≈ u(x_i, t_n) where t_n = n·Δt. Forward in time, central in space (FTCS):

u_i^{n+1} − u_i^n        u_{i+1}^n − 2 u_i^n + u_{i−1}^n
───────────────────  =  α ───────────────────────────────
       Δt                              h²

Rearrange:

u_i^{n+1} = u_i^n + r · (u_{i+1}^n − 2 u_i^n + u_{i−1}^n)        where r = α Δt / h²

This is an explicit scheme: each new value is a weighted average of three old values. The cost per step is O(N) per spatial dimension. Stability requires r ≤ 1/2 — the CFL condition for diffusion. Violate it and the numerical solution explodes exponentially, even though the true solution decays. For r = 0.5, Δt = h²/(2α): halving h forces Δt to quarter, so total work scales as h^{−3} per second of simulated time.

To dodge the CFL bottleneck, use an implicit scheme: discretise the right-hand side at time n+1 instead of n. Backward Euler gives

u_i^{n+1} − u_i^n = r · (u_{i+1}^{n+1} − 2 u_i^{n+1} + u_{i−1}^{n+1})

This is a tridiagonal linear system in {u_i^{n+1}} that must be solved at every step (O(N) by the Thomas algorithm). Unconditionally stable: take any Δt you like. The Crank-Nicolson scheme averages explicit and implicit and is second-order in both time and space. ABAQUS, COMSOL, OpenFOAM, ANSYS Fluent, and every PDE-based commercial code in the world implements some flavour of this discretisation.

The CFL condition

The Courant-Friedrichs-Lewy condition (1928) is a necessary stability requirement for explicit finite-difference schemes on PDEs with finite signal propagation. For the advection equation u_t + c·u_x = 0 with c > 0, the upwind scheme

u_i^{n+1} = u_i^n − (c Δt / h) · (u_i^n − u_{i−1}^n)

is stable if and only if 0 ≤ c·Δt/h ≤ 1. Intuitively: the physical wave moves c·Δt in one timestep; the numerical stencil sees only one grid cell upstream. If the wave outruns the stencil (c·Δt > h), the numerical signal can never catch up to the true signal and instability follows. The CFL number ν = c·Δt/h controls the dispersion error of the scheme as well as stability — ν = 1 is exact for upwind, ν close to 0 is heavily diffusive.

For diffusion (heat equation), the CFL-like bound is r = α·Δt/h² ≤ 1/2, which is much more restrictive. For h = 0.001 and α = 1, this forces Δt ≤ 5 × 10⁻⁷ — a million time steps per simulated second. This is why implicit and semi-implicit methods dominate in production diffusion solvers.

Where finite differences show up

Computational fluid dynamics. Direct numerical simulation (DNS) of turbulence uses high-order compact or spectral finite-difference schemes on Cartesian grids. NASA's FUN3D, the Air Force Research Laboratory's Cobalt, and the Japanese FFR all combine finite differences with finite volumes for compressible flow.
Seismic imaging. Oil-and-gas exploration runs reverse-time migration by solving the elastic wave equation with high-order finite differences on grids of billions of nodes. PETROBRAS, ExxonMobil, and Saudi Aramco all use staggered-grid finite-difference codes (often 8th-order in space).
Weather and climate. The Weather Research and Forecasting model (WRF), the European Centre's IFS, and NOAA's GFS all discretise the primitive equations of the atmosphere on latitude-longitude grids using finite differences for vertical advection and finite volumes for horizontal transport.
Financial derivatives pricing. The Black-Scholes PDE for option prices is solved by finite differences (Crank-Nicolson) inside trading systems at Goldman Sachs, J.P. Morgan, and Citadel. American options, barrier options, and basket options all reduce to grids of 100 × 1000 × 1000 backward-time-stepped through finite differences.
Cardiac electrophysiology. The monodomain and bidomain equations of heart muscle electrical activity are solved by finite-difference Poisson and reaction-diffusion solvers on cardiac mesh data. Used in pre-surgical planning for arrhythmia ablation.
Image processing. Total-variation denoising, level-set image segmentation, anisotropic diffusion, and the Perona-Malik filter all solve PDEs on image grids with finite-difference operators — every modern image-restoration pipeline contains finite-difference kernels.

Variants and extensions

Upwind schemes. For advection-dominated PDEs, biased one-sided differences in the direction of flow are more stable than central differences, at the cost of one order of accuracy. Standard in CFD and conservation laws.
Compact (Padé) schemes. Implicit finite-difference formulas where derivatives are coupled through a tridiagonal system. Achieve fourth- or sixth-order accuracy on a 3-point stencil. Used in turbulence DNS for spectral-like accuracy at finite-difference cost.
Staggered grids. Velocity and pressure live on offset grids to avoid checkerboard pressure modes. Standard in incompressible CFD (Marker-and-Cell, MAC scheme) and in time-domain electromagnetics (Yee's grid for FDTD).
WENO and ENO schemes. Weighted Essentially Non-Oscillatory schemes choose stencils adaptively to avoid the Gibbs phenomenon near shocks. Used in compressible flow, gravitational wave simulations, and astrophysics codes (FLASH, AMReX).
Finite-difference time-domain (FDTD). The dominant finite-difference scheme for solving Maxwell's equations. Used in antenna design, ground-penetrating radar, and medical imaging RF coil simulation.
Mimetic finite differences. Discretisations that preserve discrete vector calculus identities (curl·grad = 0, div·curl = 0) on unstructured meshes. Important for magnetohydrodynamics and electromagnetism.

Common pitfalls

Pushing h too small. Truncation error scales as h^p; round-off error in (f(x+h) − f(x)) scales as ε·|f|/h. The optimum sits at h ≈ ε^{1/(p+1)} — about 10⁻⁸ for forward difference, 10⁻⁵ for central. Below that, round-off dominates and the answer gets worse.
Violating the CFL condition. Symptoms: solution grows exponentially, often with high-frequency oscillation. Cause: r = αΔt/h² > ½ for diffusion, or ν = cΔt/h > 1 for advection. Fix: reduce Δt or switch to an implicit scheme.
Gibbs phenomenon near discontinuities. Central differences and other high-order stencils oscillate near sharp gradients (shocks, contact surfaces). Use ENO/WENO, slope limiters, or artificial viscosity to suppress oscillation.
Forgetting boundary conditions. Interior stencils need ghost cells or one-sided stencils near boundaries. Treating boundaries naively breaks the order of accuracy and can create instability.
Naive 2D and 3D scaling. Halving h in 3D multiplies the number of unknowns by 8 and (with the CFL bound on Δt) the total work by 16. Plan accordingly: high-order methods and adaptive mesh refinement scale much better than uniform-grid second-order.

Why finite differences endure

Finite differences are the simplest possible discretisation of differential operators — Taylor expansion plus arithmetic. The price for that simplicity is the requirement of a structured grid: rectangular cells, regular spacing. For problems on simple domains (cubes, slabs, the atmosphere) that is no constraint at all and the implementation is essentially trivial. For complex geometries, finite-element and finite-volume methods take over with their flexible meshes — but every textbook starts with finite differences because the algebra is transparent, the error analysis is exactly Taylor's theorem, and the resulting linear systems have the simplest possible sparsity pattern. Most production weather, ocean, seismic, and electromagnetic codes are finite-difference at their core, with finite elements or finite volumes only where geometry demands it. The method is older than the modern computer, and every wave equation, heat equation, and Poisson equation solved on a Cartesian grid still uses essentially Euler's and Lagrange's 18th-century formulas.

Frequently asked questions

Why is the central difference second-order while the forward difference is first-order?

Taylor expand f around x: f(x+h) = f(x) + h·f'(x) + (h²/2)·f''(x) + (h³/6)·f'''(x) + …. Rearranging gives (f(x+h)−f(x))/h = f'(x) + (h/2)·f''(x) + O(h²) — the first non-vanishing error term is linear in h, so forward difference is O(h). Now consider f(x+h)−f(x−h): the even-order terms cancel by symmetry, leaving (f(x+h)−f(x−h))/(2h) = f'(x) + (h²/6)·f'''(x) + O(h⁴). The first non-vanishing error is quadratic in h. Same number of function evaluations (two), one extra order of accuracy. That is why central differences are the default.

How do you approximate the second derivative?

The central second-difference stencil (f(x+h) − 2f(x) + f(x−h))/h² approximates f''(x) with O(h²) error. The derivation is exactly the Taylor sum f(x+h) + f(x−h) = 2f(x) + h²·f''(x) + (h⁴/12)·f⁽⁴⁾(x) + …. Rearranging gives the bound. Higher-order versions use 5- or 7-point stencils — the 5-point stencil (−f(x−2h) + 16f(x−h) − 30f(x) + 16f(x+h) − f(x+2h))/(12h²) is fourth-order. This second-derivative stencil discretises the Laplacian Δu in 1D; in 2D the 5-point Laplacian stencil [1, 1, −4, 1, 1] is the most-used discretisation in computational physics.

How are finite differences used to solve partial differential equations?

Discretise the domain on a grid with spacing h in space and Δt in time. Replace each derivative operator with its finite-difference equivalent: ∂u/∂t → (u_i^{n+1} − u_i^n)/Δt for forward time; ∂²u/∂x² → (u_{i+1}^n − 2u_i^n + u_{i−1}^n)/h² for centred space. The PDE becomes an algebraic equation linking grid values. For the 1D heat equation u_t = α u_xx, this gives u_i^{n+1} = u_i^n + (αΔt/h²)·(u_{i+1}^n − 2u_i^n + u_{i−1}^n), an explicit time-stepping scheme. Stability requires αΔt/h² ≤ 1/2 (the CFL condition for diffusion). Implicit schemes (backward Euler, Crank-Nicolson) are unconditionally stable but require solving a linear system at each step.

What is the CFL condition and why does it matter?

The Courant-Friedrichs-Lewy (CFL) condition (1928) is a necessary stability condition for explicit finite-difference schemes on hyperbolic and parabolic PDEs. For a wave equation u_t + c·u_x = 0 with grid spacing h and timestep Δt, the upwind scheme is stable only if c·Δt/h ≤ 1 — physically, information must not propagate more than one grid cell per time step. For the heat equation, the condition is α·Δt/h² ≤ 1/2 — twice as restrictive in scaling. Choose h too small or c too large and Δt must shrink quadratically, making explicit methods unusable for fine grids. Implicit methods bypass CFL at the cost of solving a system each step.

When should I use finite differences versus finite elements?

Finite differences are easy to implement and natural on regular Cartesian grids — fluid dynamics, seismic modelling, weather simulation. They struggle with complex geometries (you'd need a curvilinear or unstructured mesh) and with discontinuous coefficients. Finite elements handle arbitrary geometries (triangular or tetrahedral meshes), naturally accommodate variable material properties, and offer clean mathematical theory (Galerkin formulation, error estimates). For a square or cube, finite differences are simpler and faster. For an airplane wing, a turbine blade, or a complex organ, finite elements dominate. Modern hybrid finite-volume methods sit in between and are the standard in CFD codes like OpenFOAM and Ansys Fluent.

What is the trade-off between higher-order stencils and computational cost?

A k-point stencil costs k function evaluations per grid point and gives accuracy O(h^{k-1}) for f', O(h^{k-2}) for f''. The 3-point centred stencil is O(h²) — cheap and the default. The 5-point stencil is O(h⁴) — twice the work per point but 16× better error per halving of h. For very smooth problems where you need many digits, spectral methods (using all grid points in a single FFT-based derivative) achieve exponential convergence. For problems with steep gradients or discontinuities (shock waves, contact surfaces), high-order stencils can oscillate (Gibbs phenomenon); ENO and WENO schemes select stencils adaptively to suppress oscillation.

Why can't you make h arbitrarily small to reduce error?

Two reasons. First, round-off: computing f(x+h) − f(x) at very small h subtracts nearly equal numbers, amplifying floating-point noise — catastrophic cancellation. In double precision, the optimal h for a forward difference is approximately ε^{1/2}·|x| ≈ 10⁻⁸·|x|, balancing truncation error h·|f''|/2 against round-off ε·|f|/h. Pushing h smaller than that increases error. Second, cost: in PDE solvers, halving the grid spacing increases the number of unknowns by 2^d in d dimensions and tightens the CFL constraint on Δt — total work scales as h^{−(d+1)} or worse. Higher-order stencils reach the same accuracy with larger h, which is why they dominate in production codes despite the implementation complexity.