Iterative Methods

Arnoldi Iteration

Orthonormal basis of the Krylov subspace span{v, Av, A²v, ...} for non-symmetric A

Arnoldi iteration builds an orthonormal basis Q of the Krylov subspace via modified Gram–Schmidt, producing an upper-Hessenberg matrix H with A·Q_k = Q_{k+1}·H̃_k. Foundation of GMRES, sparse eigenvalue solvers, and every Krylov method for non-symmetric matrices.

Builds basis ofK_k(A, v) = span{v, Av, ..., A^(k-1)v}
IdentityA·Q_k = Q_{k+1}·H̃_k (H̃ upper Hessenberg)
Per-step cost1 mat-vec + k inner products + k updates
k iterations costO(k²N + kN²) flops
StorageO(kN) (all basis vectors)
Used inGMRES, ARPACK, implicitly-restarted Arnoldi (eigs)

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The Krylov subspace

Start with a matrix A and a vector v with ‖v‖ = 1. The k-th Krylov subspace is

K_k(A, v) = span{ v, A·v, A²·v, ..., A^(k-1)·v }

It's the natural set of vectors A reaches from v by repeated multiplication. The dominant eigenvector of A asymptotically lies inside K_k (that's just power iteration); the next-largest eigenvectors emerge as k grows. Most importantly for linear-system solvers, the solution x of Ax = b lives somewhere inside K_n(A, b) — usually approximated well by a vector in K_m for some m ≪ n.

Naïvely you could just stack v, Av, A²v, ... as columns of a matrix. But each Aⁱv aligns more and more with the dominant eigenvector, so these vectors become near-parallel and the basis degenerates. In double precision the columns of [v | Av | ... | A^kv] become numerically rank-deficient after about ten to twenty applications of A. The basis is unusable.

Arnoldi iteration solves this by orthogonalising as you go. After computing each new Aᵏv it subtracts components along every previous basis vector, leaving only the new direction. The result is an orthonormal basis Q = [q₁ | q₂ | ... | q_k] for the same span — numerically well-conditioned, even for k = hundreds.

The algorithm

q_1 = v / ‖v‖
for j = 1, 2, ..., m:
    w = A · q_j                         # one matrix-vector product
    for i = 1, 2, ..., j:               # modified Gram–Schmidt
        h_{ij} = q_iᵀ · w
        w = w - h_{ij} · q_i
    h_{j+1, j} = ‖w‖
    if h_{j+1, j} == 0: break           # invariant subspace reached
    q_{j+1} = w / h_{j+1, j}

At each outer step j: apply A once to the latest basis vector, then orthogonalise the result against all previous basis vectors using modified Gram–Schmidt. The projection coefficients h_{ij} get stored in column j of an upper-Hessenberg matrix H̃ ∈ ℝ^((m+1) × m). The final h_{j+1, j} = ‖w‖ is the norm of the leftover (the sub-diagonal entry that makes H̃ Hessenberg rather than upper triangular).

If h_{j+1, j} = 0 the algorithm has discovered an A-invariant subspace — span{q_1, ..., q_j} is closed under A. This is rare in finite precision but does happen for special starts; in that case the projected eigenvalues are exact eigenvalues of A.

The Arnoldi identity

The whole iteration can be written as one compact identity:

A · Q_m = Q_{m+1} · H̃_m

where Q_m is the N×m matrix with orthonormal columns q_1, ..., q_m, and H̃_m is the (m+1)×m upper-Hessenberg matrix of coefficients. Truncating to the square m×m block H_m gives

Q_mᵀ · A · Q_m = H_m

H_m is the orthogonal projection of A onto the Krylov subspace, expressed in the orthonormal basis Q. Eigenvalues of H_m are called Ritz values and approximate eigenvalues of A — typically converging first to the extremes of A's spectrum (largest and smallest |λ|). Eigenvectors of H_m, when lifted back through Q (y → Q_m · y), are Ritz vectors approximating A's eigenvectors.

Worked example — a 4×4 non-symmetric matrix

Take a small non-symmetric example to see the structure:

A = [[4, 1, 0, 0],
     [2, 3, 1, 0],
     [0, 1, 2, 1],
     [0, 0, 1, 1]]
v = [1, 1, 1, 1]ᵀ / 2

Run Arnoldi for 3 steps starting from q_1 = v:

step j=1:
  w = A·q_1 = [2.5, 3.0, 2.0, 1.0]ᵀ
  h_{11} = q_1ᵀ·w = 2.5·0.5 + 3.0·0.5 + 2.0·0.5 + 1.0·0.5 = 4.25
  w = w - 4.25·q_1 = [0.375, 0.875, -0.125, -1.125]ᵀ
  h_{21} = ‖w‖ ≈ 1.45
  q_2 = w / 1.45 ≈ [0.259, 0.604, -0.086, -0.776]ᵀ

step j=2:
  w = A·q_2 ≈ [1.640, 1.130, 0.346, -0.862]ᵀ
  h_{12} = q_1ᵀ·w ≈ 1.127
  h_{22} = q_2ᵀ·w ≈ 1.610
  w = w - h_{12}·q_1 - h_{22}·q_2 ≈ [-0.342, 0.083, 0.535, -0.034]ᵀ
  h_{32} = ‖w‖ ≈ 0.643
  q_3 = w / 0.643 ≈ [-0.532, 0.129, 0.832, -0.053]ᵀ

step j=3: (similar, populates h_{13}, h_{23}, h_{33}, h_{43})

H_3 = [[4.25 , 1.127, ... ],
       [1.45 , 1.610, ... ],
       [0    , 0.643, ... ]]                ← upper Hessenberg

After 3 steps the eigenvalues of H_3 are the three Ritz values approximating A's spectrum. They converge to the extremes first — usually the largest and smallest |λ| of A — long before approximating the interior of the spectrum. The full A has eigenvalues approximately {4.86, 3.20, 1.46, 0.48}; H_3 will capture 4.86 and 0.48 well first.

Arnoldi vs Lanczos vs QR vs power iteration

	Power iteration	Arnoldi	Lanczos	QR algorithm
Matrix type	General	Non-symmetric	Symmetric / Hermitian	Dense (any)
Eigenvalues extracted	Dominant only	Up to k after k steps	Up to k after k steps	All n
Projected matrix	Scalar (Rayleigh)	Upper Hessenberg	Tridiagonal	N/A (works on A directly)
Cost per step	1 mat-vec	1 mat-vec + O(kN)	1 mat-vec + O(N)	O(N²) per QR step
Storage	O(N)	O(kN)	O(N) (3 vectors)	O(N²)
Best when	One mode, sparse A	Few eigenvalues, sparse non-sym A	Few eigenvalues, sparse sym A	Full spectrum, dense A, small N

Arnoldi is the right tool when A is non-symmetric, sparse, and you want a handful of eigenvalues (typically the extremes) rather than the full spectrum. For dense matrices with N ≲ 1000 the dense QR algorithm is faster overall because it doesn't fight cache misses; for sparse N ≳ 10000 with a few-dozen eigenvalues, Arnoldi wins by orders of magnitude.

Where the cost goes

The per-step cost of Arnoldi has two pieces. First, the matrix-vector product A·q_j: O(N²) for dense A, O(nnz) for sparse A (nnz = number of non-zeros). Second, the orthogonalisation against all j previous basis vectors: j inner products of length N plus j vector updates, totalling O(jN) flops.

After m steps the total cost is

total flops ≈ m · (matvec cost) + sum_{j=1}^{m} O(jN)
            = m · (N² or nnz) + O(m²N / 2)

The m²N/2 term is the orthogonalisation overhead. For very large m it dominates — which is why GMRES restarts the Arnoldi process every m = 20–50 iterations rather than running it for thousands of steps. Storage of all m basis vectors costs O(mN) memory; restarting also bounds that.

Where Arnoldi shows up

GMRES. Generalised minimal residual — solves Ax = b for non-symmetric A by minimising ‖b - Ax‖ over the Krylov subspace built by Arnoldi. The default sparse-linear-solver in PETSc, Trilinos, and SciPy's scipy.sparse.linalg.gmres.
Implicitly Restarted Arnoldi (IRAM). The algorithm behind ARPACK and MATLAB's eigs — computes a few eigenvalues of large sparse non-symmetric matrices. Used in quantum chemistry, structural mechanics, and PageRank-style problems too large for power iteration's single eigenvalue.
Model reduction of dynamical systems. Reduce a large linear system ẋ = Ax + Bu to a small one using a Krylov subspace as the reduced basis. Standard in control theory, MEMS, and circuit simulation (PRIMA, PVL algorithms).
Pseudospectrum computation. Plotting the ε-pseudospectrum of A — the set of z where ‖(zI - A)⁻¹‖ > 1/ε — uses Arnoldi to compute the resolvent norm on a grid. Tool for understanding stability of non-normal matrices in fluid dynamics and ecology.
Quantum chemistry electronic structure. Lanczos for symmetric Hamiltonians, Arnoldi for non-Hermitian effective Hamiltonians (e.g. complex-scaled methods for resonances). Codes: NWChem, Quantum ESPRESSO, BAGEL.

Common pitfalls

Classical Gram–Schmidt loses orthogonality. The textbook formula for projection coefficients computes h_{ij} for all i before subtracting; rounding errors accumulate so the columns of Q drift from orthogonal. Modified Gram–Schmidt subtracts one component at a time and is far more stable. Use it.
Reorthogonalisation needed for long runs. Even modified Gram–Schmidt loses O(machine epsilon) of orthogonality per step. After 50+ steps the basis is no longer numerically orthogonal. Production codes do a second Gram–Schmidt pass when ‖w‖ drops by more than a factor of ~10 during orthogonalisation.
Naïve restart loses information. Stopping and starting fresh from a new q_1 throws away all spectral information learned. Implicit restarting (IRAM) uses a polynomial filter to keep good Ritz directions while discarding bad ones.
Ritz values converge from outside in. Don't expect Arnoldi to give interior eigenvalues — it converges first to the extremes of the spectrum. To target an interior eigenvalue near σ, run Arnoldi on (A - σI)⁻¹ instead (shift-and-invert).
Storage explosion. Holding all m basis vectors costs O(mN) memory; for N = 10⁹ even m = 50 is 50·N doubles = 400 GB. Restart aggressively and use out-of-core storage for very large problems.

A bit of history

Walter Edwin Arnoldi published the iteration in 1951 in his paper "The principle of minimized iterations in the solution of the matrix eigenvalue problem" (Quarterly of Applied Mathematics). Cornelius Lanczos had introduced the symmetric special case in 1950. Both algorithms sat in relative obscurity for two decades — they were unstable in finite precision (the orthogonality loss problem), and computers of the time didn't have the dynamic range to make them practical.

Yousef Saad's work in the 1980s — particularly the 1981 paper that introduced GMRES and the 1992 monograph "Numerical Methods for Large Eigenvalue Problems" — revived Arnoldi as the cornerstone of modern sparse linear algebra. Danny Sorensen's 1992 IRAM algorithm packaged it into ARPACK, the implementation behind MATLAB's eigs, SciPy's scipy.sparse.linalg.eigs, and most quantum chemistry codes. The 75-year-old algorithm now powers nearly every large non-symmetric eigenvalue computation.

Frequently asked questions

What is the Krylov subspace and why is it useful?

The k-th Krylov subspace generated by A and v is K_k(A, v) = span{v, Av, A²v, ..., A^(k-1)v}. It's the natural place to look for approximate eigenvectors or solutions to Ax = b — each vector you add costs only one matrix-vector multiply, and the subspace captures the directions A reaches by repeated application. The dominant eigenvector of A lies asymptotically inside K_k, as do the eigenvectors of the next-largest |λ|. Krylov methods extract good approximations from this subspace without ever forming A explicitly.

Why orthogonalise — can't we just use v, Av, A²v directly?

Because powers of A converge to the dominant eigenvector — so v, Av, A²v become nearly parallel, and the basis becomes numerically rank-deficient. In double precision the basis goes nearly singular after about 10–20 steps for typical matrices. Gram–Schmidt orthogonalisation re-spreads them: each new basis vector is forced perpendicular to all previous ones, preserving numerical rank. Modified Gram–Schmidt (orthogonalising against each vector one at a time) is more stable than classical Gram–Schmidt for this iteration.

What is the Hessenberg matrix H and what does it tell us?

H is the matrix of orthogonalisation coefficients — entry h_{ij} = q_iᵀ · A · q_j is the projection of A·q_j onto q_i. It's upper Hessenberg (zero below the first sub-diagonal) because A·q_j lies in span{q_1, ..., q_{j+1}}. H is the restriction of A to the Krylov subspace, expressed in the orthonormal basis Q. Eigenvalues of H (Ritz values) approximate eigenvalues of A; eigenvectors of H lifted back through Q (Ritz vectors) approximate A's eigenvectors.

What is the cost per iteration?

Step k costs one matrix-vector product (O(N²) for dense A, O(nnz) for sparse) plus k inner products and k vector updates for the Gram–Schmidt orthogonalisation against all previous basis vectors — that's O(kN) extra work. After m steps total cost is O(m·N² + m²·N) flops. The m²·N term is the orthogonalisation overhead: it grows quadratically with subspace size, which is why GMRES is usually restarted every m = 20–50 iterations to bound it.

How does Arnoldi reduce to Lanczos when A is symmetric?

When A is symmetric (or Hermitian), the Hessenberg matrix H becomes symmetric as well — which forces it to be tridiagonal. Most of H's entries are zero, and the orthogonalisation step against k previous vectors collapses to a three-term recurrence: q_{k+1} depends only on q_k, q_{k-1}, and a scalar coefficient. This is the Lanczos algorithm. The full O(kN) Gram–Schmidt cost drops to O(N) per step, and storage drops from O(kN) to O(N) — a huge win for large sparse symmetric problems.

Why does Arnoldi sometimes lose orthogonality?

Finite-precision arithmetic. Even modified Gram–Schmidt loses about O(machine epsilon) of orthogonality per step, and these errors accumulate. After 50–100 steps the basis vectors are no longer orthogonal to machine precision, which corrupts Ritz value estimates and ruins GMRES residual computations. Production codes use full reorthogonalisation (a second Gram–Schmidt pass) or selective reorthogonalisation (only against basis vectors that have lost orthogonality, as in implicitly restarted Arnoldi).

Who first wrote down Arnoldi iteration?

Walter Edwin Arnoldi, an American aerodynamicist at United Aircraft Corporation, in 1951. His paper "The principle of minimized iterations in the solution of the matrix eigenvalue problem" generalised Lanczos's symmetric algorithm (published 1950) to non-symmetric matrices. The method was largely a curiosity until the 1970s, when Yousef Saad and others revived it as a foundation for GMRES and other modern Krylov methods. Today it's the standard sparse-eigenvalue tool in ARPACK, SLEPc, MATLAB's eigs, and SciPy's eigs.