Numerical Methods

Fixed-Point Iteration

Solve x = g(x) by repeatedly applying g — the foundation of nearly every iterative method in numerical analysis

Fixed-point iteration solves x = g(x) by starting from a guess x_0 and applying g over and over: x_{n+1} = g(x_n). It converges to a fixed point x* iff g is a contraction near x* — equivalently |g'(x*)| < 1 for smooth scalar g. Rate is linear with constant |g'(x*)|. This single iteration is the engine inside Newton's method, Picard iteration for ODEs, power iteration for eigenvalues, PageRank, and value iteration for Markov decision processes.

Iterationx_{n+1} = g(x_n)
Fixed pointx* with g(x*) = x*
Converges iff|g'(x*)| < 1 (locally)
Convergence ratelinear with constant |g'(x*)|
Existence theoremBanach fixed-point
Appears inNewton, Picard, power iter, PageRank, value iter

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The cobweb idea

Take any continuous function g and a starting point x_0. Compute x_1 = g(x_0). Then x_2 = g(x_1). Then x_3 = g(x_2). The question is what the sequence (x_n) does. If it settles to some limit x*, by continuity g(x*) = x*, so x* is a fixed point of g.

Visualise the iteration as a cobweb diagram. Plot y = x (the identity line) and y = g(x) on the same axes. The fixed points are intersections — solutions of g(x) = x. Starting at x_0 on the x-axis:

Move vertically to the curve y = g(x), reaching the point (x_0, g(x_0)) = (x_0, x_1).
Move horizontally to the diagonal y = x, reaching (x_1, x_1).
Move vertically to the curve again: (x_1, g(x_1)) = (x_1, x_2).
Move horizontally to the diagonal: (x_2, x_2). Repeat.

The trajectory traces a stair or spiral converging into the fixed point if it is stable — or running away from it if it is unstable. The criterion for stability is the slope of g at the intersection: if |g'(x*)| < 1, the curve is shallower than the diagonal locally, and successive iterates contract toward x*. If |g'(x*)| > 1, the curve is steeper and iterates fly away.

Why |g'(x*)| < 1 is the convergence criterion

Let e_n = x_n − x* be the error at iteration n. Taylor-expand g around x*:

x_{n+1} = g(x_n) = g(x*) + g'(x*) · e_n + ½ g''(x*) · e_n² + O(e_n³)
       = x* + g'(x*) · e_n + ½ g''(x*) · e_n² + …

Subtract x* from both sides:

e_{n+1} = g'(x*) · e_n + ½ g''(x*) · e_n² + O(e_n³)

The leading term is linear in e_n with constant g'(x*). If |g'(x*)| < 1, the linear term contracts the error: |e_{n+1}| ≈ |g'(x*)| · |e_n|, so the error shrinks geometrically at rate |g'(x*)|. If |g'(x*)| > 1, the linear term expands the error and the iteration diverges. If g'(x*) = 0, the linear term vanishes and the quadratic term dominates: |e_{n+1}| ≈ ½ |g''(x*)| · |e_n|² — quadratic convergence, which is exactly Newton's method's regime by design.

Rate of convergence: reaching error ε from initial e_0 takes about log(ε/e_0) / log|g'(x*)| iterations. For |g'(x*)| = 0.5, error halves each step; for |g'(x*)| = 0.99, error shrinks only 1% per step — 230 iterations per decade of accuracy. Newton's method achieves effective |g'(x*)| → 0 and doubles digits per step instead.

Worked example: finding the Dottie number

Solve x = cos(x). Here g(x) = cos(x) and the unique fixed point in [0, π/2] is the Dottie number x* ≈ 0.7390851332151606. At the fixed point, g'(x*) = −sin(x*) ≈ −0.6736, so |g'(x*)| ≈ 0.67 — the iteration converges, at linear rate 0.67.

Iterate from x_0 = 0.5:

n     x_n            |x_n − x*|
─     ────────────   ─────────────
0     0.5000000000   2.39 × 10⁻¹
1     0.8775825619   1.38 × 10⁻¹
2     0.6390124942   1.00 × 10⁻¹
3     0.8026851007   6.36 × 10⁻²
4     0.6947780268   4.43 × 10⁻²
5     0.7681958313   2.91 × 10⁻²
6     0.7191654459   1.99 × 10⁻²
10    0.7384692949   6.16 × 10⁻⁴
20    0.7390851315   1.7  × 10⁻⁹
30    0.7390851332   2.7  × 10⁻¹⁴
40    0.7390851332   < 10⁻¹⁶  (machine precision)

Each step shrinks the error by roughly 0.67 — the expected linear rate, halving every 1.78 iterations. Reaching machine precision takes about 40 iterations. Compare to Newton's method on f(x) = x − cos(x): the Newton iterate is x_{n+1} = x_n − (x_n − cos x_n)/(1 + sin x_n), with g'(x*) = 0 by Newton's design — and Newton converges to machine precision in about 5 iterations from the same start.

Aitken's Δ² acceleration applied to the fixed-point sequence above converts the linear convergence rate 0.67 into quadratic: a typical accelerated trajectory hits machine precision in 6 iterations, matching Newton's pace without ever using the derivative.

Iterative methods compared

Method	Form	Rate	Order	g'(x*)	Used for
Generic fixed-point	x_{n+1} = g(x_n)	\|g'(x*)\|	1 (linear)	< 1	Iterative schemes generally
Picard for ODEs	y_{n+1}(t) = y_0 + ∫f(s, y_n) ds	L·h	1	Lh < 1	ODE existence (Picard-Lindelöf)
Newton-Raphson	x − f(x)/f'(x)	e_n²	2 (quadratic)	= 0	Root finding, calculator hardware
Power iteration	x_{n+1} = A x_n / ‖A x_n‖	\|λ_2 / λ_1\|	1	varies	Dominant eigenvector, PageRank
Value iteration (MDPs)	V_{n+1} = T V_n	γ (discount)	1	= γ < 1	Reinforcement learning, optimal control
Jacobi for linear systems	x_{n+1} = D⁻¹(b − R x_n)	‖D⁻¹R‖	1	spectral radius < 1	Iterative linear solvers
Anderson acceleration	weighted average	superlinear	≥ 1	variable	SCF iteration, DFT, deep eq nets
Steffensen / Aitken Δ²	extrapolated triple	e_n²	2	= 0 effectively	Acceleration of linear sequences

The pattern: linear convergence (order 1) is the generic case, controlled by |g'(x*)|. Quadratic and higher orders require special structure — either g'(x*) = 0 by design (Newton, Steffensen) or extrapolation across iterates (Aitken, Anderson). For any problem you can pose as a contraction mapping, Banach's fixed-point theorem gives existence, uniqueness, and a constructive convergent iteration with quantifiable rate.

Newton's method is fixed-point iteration in disguise

Solving f(x) = 0 by Newton's method is fixed-point iteration with

g(x) = x − f(x) / f'(x)

Differentiate: g'(x) = 1 − [f'(x)² − f(x) f''(x)] / f'(x)² = f(x) · f''(x) / f'(x)². At a simple root r where f(r) = 0, this evaluates to g'(r) = 0 — Newton's update map has zero derivative at the root. The linear term in the error recurrence drops out and the quadratic term dominates: |e_{n+1}| ≈ ½ |g''(r)| · |e_n|². That is exactly Newton's quadratic convergence rate.

The construction generalises. Given f(x) = 0 with f'(x*) ≠ 0, the family of fixed-point maps g(x) = x − k·f(x)/f'(x) for k ≠ 0 all have fixed points at the roots of f. For k = 1, g'(r) = 0 — Newton. For k = ½, g'(r) = ½ — linear convergence. For k = 2 (double Newton), g'(r) = −1 — instability. The Newton choice is the unique k that linearises perfectly at the root.

Where fixed-point iteration shows up

Newton's method in numerical hardware. Every floating-point division on every Intel, AMD, and ARM CPU is a fixed-point iteration on the Newton reciprocal map x_{n+1} = x_n(2 − a x_n).
PageRank. Google's original ranking algorithm iterates p_{n+1} = (1 − d)/N · 1 + d · M·p_n on the damped link matrix M. The iteration is a contraction with rate d (the damping factor, ~0.85), so PageRank converges geometrically in ~30 iterations to the stationary distribution.
Value iteration in reinforcement learning. Bellman's optimality operator V → T V is a γ-contraction for any discount factor γ < 1 in the sup-norm. Value iteration converges geometrically to the optimal value function; DeepMind's AlphaGo, Atari DQN, and OpenAI's PPO all rest on this contraction at their core.
Self-consistent field iteration in computational chemistry. Density functional theory and Hartree-Fock methods iterate ρ_{n+1} = F(ρ_n) where F is the SCF operator and ρ is the electron density. Damped and Anderson-accelerated variants are the standard convergence accelerators in Gaussian, VASP, and Quantum ESPRESSO.
Picard iteration in ODE theory. The Picard-Lindelöf existence theorem proves ODE solutions exist by showing the integral operator y → y_0 + ∫f(s, y(s))ds is a contraction on a small time interval. Picard's iterated integrals are the constructive sequence.
Optimisation algorithms. Proximal-gradient, ADMM, Douglas-Rachford splitting, and forward-backward methods in convex optimisation are all fixed-point iterations on non-expansive operators in a Hilbert space. CVXPY, Mosek, and Splitting Conic Solver use them.
Iterative linear solvers. Jacobi, Gauss-Seidel, SOR, and multigrid are fixed-point iterations on different splittings of the linear system Ax = b. Convergence depends on the spectral radius of the iteration matrix.
Deep equilibrium networks. DEQ models replace deep stacks of layers with the fixed point of a single layer applied repeatedly: z = f(z, x). Backpropagation through the fixed point uses implicit differentiation. State-of-the-art results in 2019-2024 on certain vision and language tasks at a fraction of standard model depth.

Variants and extensions

Aitken's Δ² acceleration. Given x_n, x_{n+1}, x_{n+2} from a linearly convergent sequence, the formula ̂x = x_n − (Δx_n)² / Δ²x_n produces a much better estimate of the limit, often jumping from linear to quadratic convergence per acceleration step.
Steffensen's method. Apply Aitken's Δ² to the natural fixed-point sequence, restart, repeat. Achieves quadratic convergence without derivatives — competitive with Newton when f' is awkward.
Anderson acceleration. Uses the last m iterates to compute a weighted update, generalising Aitken to multi-dimensional problems. State-of-the-art for SCF, density estimation, and deep equilibrium networks.
Damped fixed-point iteration. Replace x_{n+1} = g(x_n) with x_{n+1} = (1 − α)·x_n + α·g(x_n) for some 0 < α ≤ 1. Adds artificial dissipation; restores stability when |g'(x*)| ≥ 1.
Krasnoselskii-Mann iteration. A specific damped fixed-point for nonexpansive operators (|g'(x*)| = 1 boundary case). Standard in convex optimisation and proximal-point algorithms.
Mann-Halpern iteration. Variants of damped iteration with provable convergence on monotone operators in Hilbert space; used in iterative algorithms for variational inequalities.

Common pitfalls

Diverging when |g'(x*)| ≥ 1. The simplest mistake is iterating g without checking the derivative at the fixed point. If |g'(x*)| > 1, the iteration runs away geometrically. Fix: choose a different g (e.g., solve x = g(x) as x = g⁻¹(x) instead), or add damping.
Slow convergence near g'(x*) = 1. Linear convergence at rate 0.99 means 230 iterations per digit of accuracy. If the natural iteration is slow, accelerate (Aitken, Anderson) or reformulate.
Wrong basin of attraction. If g has multiple fixed points, the iteration converges to the one in the same basin as x_0. Starting in the wrong basin reaches the wrong answer. Bracketing or geometric analysis can predict basins.
Oscillating non-convergence. When g'(x*) = −1 + ε for small ε > 0, the iteration converges but oscillates around x* with period 2. Apply Aitken's Δ² to extract the limit cleanly.
Forgetting completeness. Banach's theorem needs a complete metric space. On rational numbers, on open intervals like (0, 1], or on incomplete function spaces, contractions can fail to have fixed points. Always work in a complete setting — Banach spaces, closed bounded subsets of ℝⁿ.

Why fixed-point iteration is everywhere

Nearly every iterative algorithm in numerical analysis can be cast as a fixed-point iteration on some map. The unified picture is Banach's contraction mapping theorem: given any contraction g on a complete metric space, the iteration x_{n+1} = g(x_n) converges to the unique fixed point at rate equal to the Lipschitz constant of g. Newton's method engineers g so its Lipschitz constant is zero at the root (quadratic convergence). PageRank uses a damping factor to guarantee the Markov chain is a contraction. Picard iteration shows ODE solutions exist by exhibiting a contraction on a function space. Value iteration in RL converges because the Bellman operator is a γ-contraction. The shared mathematical backbone is fifty lines of analysis written down by Banach in 1922 — and it underlies most of what modern numerical software actually computes.

Frequently asked questions

When does fixed-point iteration converge?

Locally near a fixed point x* of a smooth g, the iteration converges iff |g'(x*)| < 1. The proof is a one-line Taylor argument: writing the error e_n = x_n − x*, we have e_{n+1} = g(x_n) − g(x*) ≈ g'(x*)·e_n, so |e_{n+1}| ≈ |g'(x*)|·|e_n|. If |g'(x*)| < 1, the error shrinks geometrically; if |g'(x*)| > 1, the error grows. The boundary case |g'(x*)| = 1 needs higher-order analysis. Globally, Banach's fixed-point theorem says that if g is a contraction with |g'| ≤ k < 1 on a complete metric space, the iteration converges from any starting point to the unique fixed point.

What is a cobweb diagram?

Plot y = x (a diagonal line) and y = g(x) on the same axes. The fixed points of g are the intersections — where g(x) = x. The iteration x_{n+1} = g(x_n) can be drawn graphically: starting from x_n on the x-axis, go vertically to (x_n, g(x_n)) on the curve, then horizontally to (x_{n+1}, x_{n+1}) on the diagonal, then vertically to (x_{n+1}, g(x_{n+1})), and so on. The trajectory traces a cobweb that spirals into the fixed point if it is stable (|g'(x*)| < 1) or away from it if unstable (|g'(x*)| > 1). When g'(x*) < 0 the cobweb forms a true spiral; when 0 < g'(x*) < 1 the trajectory is a staircase.

How is Newton's method a special case of fixed-point iteration?

Newton's iteration x_{n+1} = x_n − f(x_n)/f'(x_n) is fixed-point iteration with g(x) = x − f(x)/f'(x). The fixed points of this g are exactly the roots of f. Differentiating: g'(x) = 1 − [f'(x)² − f(x)·f''(x)] / f'(x)² = f(x)·f''(x) / f'(x)². At a simple root r where f(r) = 0 and f'(r) ≠ 0, we get g'(r) = 0 — and that's the source of Newton's quadratic convergence. With g'(x*) = 0, the local linear-convergence model e_{n+1} = g'(x*)·e_n + ½ g''(x*)·e_n² + … drops its leading term, leaving the quadratic rate as the dominant order.

How fast does fixed-point iteration converge?

Linearly with rate |g'(x*)|. Reaching error ε from initial error e_0 takes about log(ε/e_0) / log|g'(x*)| iterations. With |g'(x*)| = 0.5, error halves every step — 50 iterations to reach 10⁻¹⁵. With |g'(x*)| = 0.9, error multiplied by 0.9 each step — about 330 iterations for the same target. With |g'(x*)| = 0.99 (a near-marginal contraction), 3400 iterations. This linear scaling is why Newton (which gives g'(r) = 0 by construction) is so much faster: in Newton, the asymptotic rate is dominated by quadratic terms, not linear.

How do you turn f(x) = 0 into a fixed-point problem?

Choose any g(x) such that g(x) = x precisely when f(x) = 0. The simplest is g(x) = x + λ·f(x) for some scalar λ. The iteration converges iff |1 + λ·f'(x*)| < 1; choosing λ ≈ −1/f'(x*) gives Newton's method, where the iteration map has zero derivative at the root and convergence becomes quadratic. The trade-off: more careful choices of g (computing f'(x*), or using BFGS-style approximations) speed convergence but cost more per step. The simplest choice — g(x) = x − f(x), corresponding to λ = −1 — gives Picard iteration; convergence requires |1 − f'(x*)| < 1, i.e. 0 < f'(x*) < 2, which is restrictive.

Why does fixed-point iteration appear everywhere in scientific computing?

Because nearly every iterative algorithm is fixed-point iteration on some map. Power iteration for the dominant eigenvalue of a matrix A iterates x_{n+1} = A·x_n / ‖A·x_n‖ — fixed point is the dominant eigenvector. PageRank iterates p_{n+1} = M·p_n on the damped link matrix — fixed point is the stationary distribution. Value iteration for Markov decision processes iterates V_{n+1}(s) = max_a [r(s,a) + γ·E[V_n(s')]] — fixed point is the optimal value function (Bellman operator is a γ-contraction). Picard iteration for ODEs iterates y_{n+1}(t) = y_0 + ∫_0^t f(s, y_n(s)) ds — fixed point is the solution. Jacobi and Gauss-Seidel iteration for linear systems iterate on a splitting of the matrix — fixed point is the solution vector. All of these are Banach contractions on appropriately chosen metric spaces.

What is Aitken's Δ² acceleration?

Aitken's Δ² accelerates linearly-convergent fixed-point iteration. Given three successive iterates x_n, x_{n+1}, x_{n+2} from a linearly convergent sequence, the formula ̂x = x_n − (Δx_n)² / Δ²x_n where Δx_n = x_{n+1} − x_n and Δ²x_n = x_{n+2} − 2x_{n+1} + x_n gives a much better estimate of the limit. Iterating the original sequence twice and then applying Aitken once is equivalent to a single step of Steffensen's method, which gives quadratic convergence for free (no derivatives needed). Aitken is the workhorse of accelerating series convergence in scientific computing and is built into numerical algebra packages.