Iterative Methods

GMRES Method

Generalised Minimal Residual — solve Ax = b for non-symmetric A by minimising the residual over an Arnoldi Krylov subspace

GMRES solves Ax = b for general non-symmetric A by building a Krylov subspace via Arnoldi iteration, then choosing the iterate that minimises ‖b − Ax‖ over that subspace. Restart every m iterations to bound memory. The default sparse non-symmetric solver in PETSc, Trilinos, and SciPy.

  • Minimises‖b − Ax_m‖_2 over x_0 + K_m
  • Builds K_m viaArnoldi iteration → A·Q_m = Q_{m+1}·H̃_m
  • Reduced problemLeast-squares: ‖βe_1 − H̃_m y‖, dim m
  • Per-step cost1 mat-vec + O(mN) Gram–Schmidt + O(m²) least-squares
  • MemoryO(mN); restart every m = 20–50 iterations
  • Used inPETSc, Trilinos, scipy.sparse.linalg.gmres, every FEM code

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The minimal-residual idea

To solve Ax = b for non-symmetric A you can't use conjugate gradient (which needs symmetric positive definiteness). The general-purpose approach is to build a sequence of trial subspaces of increasing dimension and pick the best vector in each. "Best" means the iterate whose residual r = b − Ax has the smallest 2-norm.

Start with an initial guess x_0 and its residual r_0 = b − A·x_0. The Krylov subspace generated by A and r_0 is

K_m(A, r_0) = span{ r_0, A·r_0, A²·r_0, ..., A^(m-1)·r_0 }

At step m, GMRES looks for the iterate

x_m ∈ x_0 + K_m         (affine subspace)
minimising  ‖b − A·x_m‖_2

This is a finite-dimensional least-squares problem (dimension m of the subspace), even though A acts on N-dimensional vectors. The whole machinery of GMRES is the cleverness of solving that least-squares problem efficiently — exploiting the Arnoldi basis to reduce it to a tiny dense problem of size (m+1)×m that's solved with Givens rotations.

Why Arnoldi + Hessenberg = small least-squares

Run Arnoldi for m steps starting from q_1 = r_0 / ‖r_0‖. You get an orthonormal basis Q_m = [q_1 | ... | q_m] and the Arnoldi identity

A · Q_m = Q_{m+1} · H̃_m

where H̃_m is the (m+1)×m upper-Hessenberg matrix of orthogonalisation coefficients. Write the trial iterate as x_m = x_0 + Q_m·y for some y ∈ ℝ^m. Then the residual is

r_m = b - A·x_m = r_0 - A·Q_m·y = r_0 - Q_{m+1}·H̃_m·y

Note r_0 = ‖r_0‖·q_1 = β·Q_{m+1}·e_1 where β = ‖r_0‖ and e_1 is the first standard basis vector in ℝ^(m+1). So

r_m = Q_{m+1} · (β·e_1 - H̃_m·y)

The columns of Q_{m+1} are orthonormal, so multiplication preserves the 2-norm:

‖r_m‖_2 = ‖β·e_1 - H̃_m·y‖_2

Minimising the residual over x_m reduces to minimising over y ∈ ℝ^m the norm of a small (m+1)-dimensional vector. The right-hand side β·e_1 has just one nonzero entry, and H̃_m has only m diagonals' worth of nonzero entries — the least-squares problem can be solved with m Givens rotations in O(m²) flops. That tiny computation, done after every Arnoldi step, gives GMRES its name.

The algorithm

Input: A, b, x_0, tolerance ε, restart m
r_0 = b - A·x_0
β = ‖r_0‖
q_1 = r_0 / β

for j = 1, 2, ..., m:
    w = A · q_j                              # one matrix-vector product
    for i = 1, ..., j:                       # modified Gram–Schmidt
        h_{ij} = q_iᵀ · w
        w = w - h_{ij} · q_i
    h_{j+1,j} = ‖w‖
    q_{j+1} = w / h_{j+1,j}

    # Update the rotated Hessenberg & RHS via Givens rotations
    # (cheap: O(j) operations per step)
    Apply previous rotations to column j of H̃
    Compute new Givens rotation to zero h_{j+1,j}
    Apply to RHS β·e_1

    # Check residual norm — comes free from the rotated RHS
    if |last entry of rotated RHS| / β < ε: break

# Solve upper-triangular system for y, then x_m
y = backsolve(R_j, rotated_RHS_top)
x_m = x_0 + Q_m · y

if not converged: restart with x_0 := x_m

The key trick is computing the residual norm without actually computing x_m. The Givens rotations applied to β·e_1 produce a vector whose last entry equals ‖r‖ — checked after every Arnoldi step. You only assemble x_m once at termination, saving most of the cost of the final backsolve and update.

Restarting — GMRES(m)

The memory cost of m Arnoldi steps is O(m·N) for the basis vectors plus O(m²) for the Hessenberg matrix. For a million-dimensional problem with m = 50 that's 400 MB of basis vectors — uncomfortable but tractable. For m = 500 it's 4 GB, often impractical. Worse, the orthogonalisation against all previous basis vectors at step j costs O(jN) — so by step 500 a single Arnoldi step costs as much as the matrix-vector product. The iteration becomes more expensive than direct solution.

Restarted GMRES, written GMRES(m), bounds these costs by running m Arnoldi steps, computing x_m, and then restarting from x_m as the new initial guess. The basis is discarded, memory resets, and the next round starts fresh. Common choices are m = 20, 30, or 50. The downside is that GMRES(m) can stagnate — restarting throws away the spectral information accumulated in the Krylov subspace, and convergence can be much slower than full GMRES on hard problems.

Modern variants attempt to retain useful information across restarts. Deflated GMRES (GMRES-DR) keeps a small set of approximate eigenvectors corresponding to the smallest-magnitude eigenvalues and reuses them in the next Krylov subspace. Loose GMRES allows variable preconditioners. Flexible GMRES (FGMRES) permits the preconditioner to change between iterations, useful with adaptive multigrid.

Convergence — and why preconditioners matter

For symmetric positive-definite A, CG converges in roughly √κ iterations where κ = λ_max/λ_min is the condition number. GMRES has no such clean rate — convergence depends on the full spectrum and the eigenvectors. The classical bound:

‖r_m‖ / ‖r_0‖ ≤ κ(V) · min_{p_m, p_m(0)=1} max_{λ ∈ σ(A)} |p_m(λ)|

where V is the eigenvector matrix and σ(A) the spectrum. For normal matrices κ(V) = 1 and the bound is just the smallest residual polynomial. For non-normal matrices κ(V) can be enormous, making the bound useless — and in practice GMRES can stagnate for hundreds of iterations on highly non-normal problems before suddenly making progress.

The practical lever is preconditioning. Replace Ax = b by M⁻¹A·x = M⁻¹b (left preconditioning) where M is chosen so M⁻¹A has clustered eigenvalues away from the origin. Common preconditioners:

  • Incomplete LU (ILU). Approximate LU factorisation that drops small fill-in entries. ILU(0) keeps the sparsity of A; ILU(k) allows more fill-in for better approximation. Standard in commercial PDE codes.
  • Algebraic multigrid (AMG). Builds a hierarchy of coarse approximations algebraically from A's connectivity. Optimal for elliptic problems (Laplacians, diffusion), competitive elsewhere. Trilinos's ML and Hypre's BoomerAMG are widely used.
  • Domain decomposition (additive Schwarz). Partition the domain, solve local problems independently, combine. Parallelisable. Standard in finite-element codes for elasticity and fluid flow.
  • Physics-based. For PDE problems, build a preconditioner from a related simpler PDE (e.g., approximate factorisation of the Helmholtz operator, Schur complement for saddle-point problems). The most powerful when applicable.

A good preconditioner often turns a 10,000-iteration GMRES that wouldn't converge into a 20-iteration GMRES that does. Choosing one is often more art than science — the rule of thumb is to start with ILU(0) for general sparse problems, and specialise from there.

Worked example — a small non-symmetric system

Take the 4×4 system Ax = b with

A = [[4, 1, 0, 0],
     [2, 3, 1, 0],
     [0, 1, 2, 1],
     [0, 0, 1, 1]]
b = [5, 6, 4, 2]ᵀ
x_0 = [0, 0, 0, 0]ᵀ

The true solution is x ≈ [1.0, 1.0, 1.0, 1.0]ᵀ. Run GMRES with no preconditioner:

m   ‖r_m‖ / ‖r_0‖    iterate x_m (approx)
─   ────────────────  ─────────────────────────────
0   1.000             [0.000, 0.000, 0.000, 0.000]
1   0.382             [0.512, 0.615, 0.410, 0.205]
2   0.087             [0.831, 0.913, 0.875, 0.769]
3   0.012             [0.972, 0.988, 0.979, 0.961]
4   ~10⁻¹⁵            [1.000, 1.000, 1.000, 1.000]

GMRES converges in exactly 4 steps because N = 4. This is a theoretical guarantee: in exact arithmetic GMRES is finite, terminating in at most N steps (the Krylov subspace can't have dimension larger than N). For large N this is irrelevant — you never reach N — but it does mean GMRES has no spurious cycles or convergence failures other than stagnation due to round-off.

GMRES vs CG vs Bi-CGSTAB vs QMR

GMRESCGBi-CGSTABQMR
Matrix typeAnySym. pos. def.AnyAny
Per-step memoryO(mN) (grows)O(N) (fixed)O(N) (fixed)O(N) (fixed)
RecurrenceFull ArnoldiThree-termShort recurrenceThree-term (two-sided)
ConvergenceMonotone (full GMRES)Monotone (energy norm)Often erraticMonotone (quasi-resid)
Restart neededYes (memory)NoNoNo
Best whenGeneral sparse, with preconditionerSPD systemsGeneral, memory-tightGeneral, monotone wanted

For symmetric positive definite systems, always prefer CG (or its preconditioned variant PCG). For symmetric indefinite, use MINRES. For non-symmetric, GMRES is the default — it has the best convergence properties among the non-symmetric methods, at the cost of growing memory. Bi-CGSTAB is the lightweight alternative when memory is the bottleneck; it has fixed memory but can show erratic convergence and sometimes break down.

Where GMRES shows up

  • Finite element analysis. Implicit time-stepping of structural mechanics, fluid dynamics, electromagnetics, and heat transfer all reduce to large sparse non-symmetric linear systems. ANSYS, Abaqus, COMSOL, and most academic FEM codes ship GMRES as a primary solver, usually preconditioned with ILU or AMG.
  • Computational fluid dynamics. Implicit discretisations of Navier–Stokes (incompressible or compressible) produce large non-symmetric Jacobians at each time step. GMRES with a domain-decomposition or physics-based preconditioner is the workhorse in SU2, OpenFOAM, Fluent.
  • Newton-Krylov methods. Solving nonlinear PDEs F(x) = 0 with Newton's method requires solving J·Δx = -F at each step. For large problems J is sparse and non-symmetric; GMRES is the inner Krylov solver. PETSc's SNES, Trilinos's NOX, and KINSOL all use this approach.
  • Quantum chemistry coupled-cluster methods. Solving CCSD or CCSD(T) amplitude equations reduces to large nonlinear systems; the Jacobian is non-symmetric (coupled-cluster has no variational principle), and GMRES inside Newton-Krylov is standard. Codes: NWChem, Psi4, CFOUR.
  • Markov chain stationary distributions. Computing the stationary distribution π satisfying πᵀP = πᵀ for a non-reversible Markov chain. The dense problem is QR; the sparse problem is GMRES on (I − Pᵀ)·π = 0 with a normalisation constraint.
  • Boundary integral equations. Electromagnetic scattering, acoustics, and Stokes flow problems often have dense complex non-symmetric coefficient matrices from boundary element discretisation. Fast multipole acceleration plus GMRES is the standard combination — codes like FastBEM and BEM++ use it routinely.

Common pitfalls

  • No preconditioner. Plain GMRES on a real-world problem often won't converge within a reasonable iteration count. Always start with at least ILU(0); upgrade to AMG or a physics-based preconditioner if convergence is slow.
  • Restart parameter too small. GMRES(5) on a hard problem can stagnate forever because the small Krylov subspace doesn't capture the essential spectral information. Start with m = 30 or 50; reduce only if memory forces it.
  • Misinterpreting the residual. GMRES reports ‖b − A·x_m‖, not ‖x_m − x_true‖. For ill-conditioned A, a tiny residual can hide a large solution error. Check the condition number, or use a perturbation analysis.
  • Forgetting Givens rotations are O(m²). A naïve least-squares solve at each step would be O(m³); the incremental QR factorisation with Givens makes it O(m). Don't accidentally write the naïve version when implementing from scratch.
  • Using GMRES on a symmetric positive definite system. CG is more efficient, doesn't need restart, and has tighter convergence bounds. GMRES will work but waste memory.
  • Insufficient orthogonalisation. Classical Gram–Schmidt loses orthogonality, which corrupts the residual estimate from the rotated RHS. Use modified Gram–Schmidt or do a second orthogonalisation pass when the basis norm shrinks suspiciously.

Saad and Schultz, 1986

Yousef Saad and Martin H. Schultz published "GMRES: A generalised minimal residual algorithm for solving nonsymmetric linear systems" in SIAM Journal on Scientific and Statistical Computing in July 1986. The paper built on Saad's 1981 algorithm FOM (Full Orthogonalisation Method), which used Arnoldi but did not enforce residual minimisation. GMRES's contribution was the realisation that the minimisation can be done incrementally via Givens rotations, with the residual norm available essentially for free at each Arnoldi step.

Adoption was rapid. By 1990 GMRES had displaced Bi-CGSTAB, ORTHODIR, and CGS as the default sparse non-symmetric solver in research codes. By 2000 every commercial PDE solver shipped a GMRES routine, almost always with an ILU or AMG preconditioner. Saad's 1996 book "Iterative Methods for Sparse Linear Systems" (second edition 2003) remains the standard reference. As of 2026 nearly every large-scale simulation in engineering and scientific computing runs GMRES somewhere in its inner loop.

Frequently asked questions

What problem does GMRES actually solve?

At step m, GMRES finds the iterate x_m ∈ x_0 + K_m (the affine subspace formed by adding the m-dimensional Krylov subspace to the initial guess) that minimises ‖b − A·x_m‖_2. Because K_m has dimension m, this is a finite-dimensional least-squares problem. The clever part: GMRES doesn't solve a big least-squares problem on N variables — it transforms the problem into a small one on m variables using the Arnoldi basis, where m ≪ N. The reduced problem becomes ‖βe_1 − H̃_m·y‖, an (m+1)×m least-squares system on the Hessenberg matrix from Arnoldi.

Why does GMRES need to restart?

Memory and orthogonalisation cost both grow with iteration count. Each Arnoldi step stores a new orthonormal vector (O(N) memory) and orthogonalises against all previous vectors (O(jN) work at step j). After 50 iterations on a million-dimensional problem you've used 400 MB and a billion flops on Gram–Schmidt alone. Restarted GMRES, GMRES(m), runs m Arnoldi steps, computes x_m, then discards the basis and restarts the Arnoldi from the new residual r_m = b − A·x_m. Memory is bounded by m·N. Common choices are m = 20, 30, 50.

How does GMRES converge?

Convergence depends on the spectrum of A. For a matrix with eigenvalues clustered far from the origin and not too widely spread, GMRES converges in a few dozen iterations regardless of N. For matrices with eigenvalues spanning a huge range (high condition number) or with significant non-normality, convergence can stall. There is no clean a-priori rate like CG's √κ. The standard analysis bounds the residual by the smallest polynomial value over the spectrum: ‖r_m‖ / ‖r_0‖ ≤ min_{p_m, p_m(0)=1} max_{λ ∈ σ(A)} |p_m(λ)|. Preconditioning that clusters eigenvalues away from the origin is the main lever to control rate.

How does GMRES use the Hessenberg matrix from Arnoldi?

After m Arnoldi steps, A·Q_m = Q_{m+1}·H̃_m where H̃_m is (m+1)×m upper Hessenberg. The trial iterate is x_m = x_0 + Q_m·y, and the residual is r_m = b − A·x_m = r_0 − Q_{m+1}·H̃_m·y. Since r_0 = β·q_1 = β·Q_{m+1}·e_1, the residual norm equals ‖β·e_1 − H̃_m·y‖ (orthogonality of Q_{m+1} preserves the norm). So minimising over x_m reduces to minimising over y in ℝ^m — an (m+1)×m least-squares problem on a Hessenberg matrix, solved efficiently by Givens rotations in O(m²) flops.

What's the difference between GMRES and CG?

CG (conjugate gradient) is the specialised method for symmetric positive-definite A. CG also operates on the Krylov subspace and uses a three-term recurrence (equivalent to Lanczos), giving O(N) per step and O(N) memory — no restart needed, ever. GMRES is the general-purpose method for any non-symmetric A. Because the Hessenberg matrix isn't tridiagonal in the non-symmetric case, the Arnoldi orthogonalisation cannot collapse to three terms, and GMRES must store the whole basis. The price of generality is the O(m·N) memory and the restart logic. Symmetric indefinite problems fall between: MINRES (sym indefinite) uses a three-term recurrence like CG; symmetric positive-definite, use CG.

Why do practical GMRES runs always use a preconditioner?

Because plain GMRES on a poorly-conditioned matrix often fails to converge in a reasonable number of restarts. Preconditioning replaces Ax = b by M⁻¹·A·x = M⁻¹·b (left) or A·M⁻¹·y = b, x = M⁻¹·y (right), where M is chosen so M⁻¹·A is closer to the identity and has clustered eigenvalues. Common choices: incomplete LU (ILU) factorisation, algebraic multigrid (AMG), domain decomposition (additive Schwarz), or specialised physics-based preconditioners. A good preconditioner often turns a problem that wouldn't converge in 10,000 iterations into one solved in 20.

Who invented GMRES?

Yousef Saad and Martin H. Schultz published GMRES in 1986 in SIAM Journal on Scientific and Statistical Computing. The algorithm built on Saad's earlier 1981 method (full orthogonalisation, FOM) by adding the residual-minimisation step that gives the 'M' in GMRES. Within ten years it had displaced the older Krylov methods for non-symmetric systems (Bi-CGSTAB, QMR, ORTHODIR) as the default tool, and today it's the underlying linear solver in nearly every large-scale finite element, fluid dynamics, and electromagnetics code. Saad's 1996 book "Iterative Methods for Sparse Linear Systems" is the standard reference.