Iterative Methods

Power Iteration

Find the dominant eigenvector by repeated Av/‖Av‖ — the algorithm behind PageRank

Power iteration finds the dominant eigenvector of a matrix A by repeatedly multiplying a vector by A and renormalising. The iterate aligns with the eigenvector of the largest |λ| at rate |λ₂/λ₁|^k — the spectral gap sets the speed. Six lines of code, used inside Google's original PageRank.

  • Iteration rulev ← Av / ‖Av‖
  • Convergence rate|λ₂ / λ₁|^k (linear)
  • Cost per iteration1 matrix-vector multiply
  • StorageO(N) (just one vector)
  • Eigenvalue estimateRayleigh quotient vᵀAv / vᵀv
  • Used inPageRank, ARPACK, dominant-mode analysis

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The repeated multiplication idea

Take a square matrix A and a random starting vector v₀. Multiply: v₁ = A v₀. Multiply again: v₂ = A v₁ = A² v₀. Keep going. What happens to vₖ = Aᵏ v₀ as k grows?

Most vectors get rotated by A. But there are special directions — the eigenvectors — that A only stretches. Applying A multiplies the length of each eigencomponent by its eigenvalue. The eigencomponent with the biggest eigenvalue grows fastest. After enough multiplications, that component dominates and the others become negligible by comparison. The iterate aligns with the eigenvector of the largest |λ|.

To stop the magnitude from blowing up or vanishing, divide by the norm after each step:

v_{k+1} = A v_k / ‖A v_k‖

That's power iteration. Six lines of code, one matrix-vector multiply per step, O(N) storage. The output is the direction of the dominant eigenvector; the magnitude of A v gives the dominant eigenvalue (more precisely, the Rayleigh quotient does).

Why it works — the eigenbasis decomposition

Assume A is diagonalisable with eigenvalues |λ₁| > |λ₂| ≥ ... ≥ |λₙ| and eigenvectors x₁, ..., xₙ. Decompose the starting vector in the eigenbasis:

v_0 = c_1 x_1 + c_2 x_2 + ... + c_n x_n

Apply A k times. Each application multiplies the i-th coefficient by λᵢ:

A^k v_0 = c_1 λ_1^k x_1 + c_2 λ_2^k x_2 + ... + c_n λ_n^k x_n
        = λ_1^k [ c_1 x_1 + c_2 (λ_2/λ_1)^k x_2 + ... + c_n (λ_n/λ_1)^k x_n ]

Because |λᵢ / λ₁| < 1 for i > 1, every term except the first decays geometrically as k grows. After enough iterations only c₁ x₁ survives — the dominant eigenvector, scaled by an irrelevant constant that renormalisation throws away. The error in the direction estimate shrinks like |λ₂ / λ₁|ᵏ — geometric convergence with rate equal to the spectral gap.

If λ₁ > 0, this is monotone convergence to a single direction. If λ₁ < 0, the iterate flips sign each step but the direction is still correct (modulo sign). If the dominant eigenvalue is complex (a conjugate pair has equal magnitude), the iterate rotates in a 2D invariant subspace and never settles — a failure mode handled by block power iteration or shift-and-invert.

Worked example: A = [[4, 1], [2, 3]]

This 2×2 matrix has eigenvalues λ₁ = 5 (eigenvector [1, 1]) and λ₂ = 2 (eigenvector [1, -2]). The spectral gap ratio is |λ₂ / λ₁| = 2/5 = 0.4, so each iteration shrinks the error by a factor of 0.4 — about 0.4 decimal digits per iteration.

Start from v₀ = [1, 0] (deliberately misaligned). Apply A and renormalise:

k   v_k (unit vector)              Rayleigh quotient v^T A v
─   ─────────────────────────────  ─────────────────────────
0   [1.000, 0.000]                  4.000
1   [0.894, 0.447]                  4.600
2   [0.819, 0.574]                  4.846
3   [0.787, 0.617]                  4.939
4   [0.775, 0.632]                  4.976
5   [0.770, 0.638]                  4.991
...
10  [0.7072, 0.7070]                4.99987
...
20  [0.70711, 0.70711]              5.00000  (machine precision)

The unit vector converges to [√2/2, √2/2] ≈ [0.707, 0.707] — the normalised dominant eigenvector. The Rayleigh quotient converges to λ₁ = 5. The error each step is roughly 0.4× the previous — exactly the predicted (λ₂/λ₁) rate.

For PageRank-sized problems (~10⁹ × 10⁹) the same iteration runs on a sparse transition matrix with a spectral gap of about 0.15, giving 50-100 iterations to four-digit accuracy. The cost is dominated by sparse matrix-vector multiplies, which is exactly what GPUs and distributed systems excel at — one reason power iteration scaled to the web.

Power iteration vs other eigensolvers

Power iterationInverse iterationLanczosQR algorithm
TargetsLargest |λ|Smallest |λ| (or near σ)Extremal λ of symmetric AAll eigenvalues
Cost per iteration1 mat-vec1 linear solve1 mat-vec + reorthoO(N³) dense
StorageO(N)O(N) + factorO(kN)O(N²)
Convergence rate|λ₂/λ₁|^k|λ_min / λ_2nd|^kSuperlinear at edges~k log k
Requires A explicit?NoNo (sparse factor)NoYes (dense)
Best whenOne dominant modeSmallest modeFew extremes, sparse AFull spectrum, small N

For very large sparse matrices, Arnoldi and Lanczos are the modern descendants of power iteration. Both build a Krylov subspace span{v, Av, A²v, ..., A^(k-1)v} and extract several eigenvalues from the small projected matrix. Power iteration is the k = 1 special case — useful when you really only need the top eigenvalue, or as the inner loop of a larger algorithm.

The algorithm

v = random_unit_vector(n)
for k = 1, 2, 3, ...:
    w = A · v                # one matrix-vector multiply
    λ_est = v · w            # Rayleigh quotient (cheap inner product)
    v = w / ‖w‖              # renormalise
    if |λ_est - λ_prev| < tol: stop
    λ_prev = λ_est
return v, λ_est

The Rayleigh quotient vᵀA v converges to λ₁ twice as fast as the vector v itself when A is symmetric (a consequence of vᵀA v being stationary at the eigenvector). For non-symmetric A the quotient converges at the same rate as v but is still a much better estimator than ‖A v‖ alone, because it picks up the sign.

PageRank: power iteration at web scale

Brin and Page's 1998 PageRank algorithm models a "random surfer" who follows hyperlinks. The web has roughly 10⁹ pages; the link transition matrix P has P[i,j] = 1/out-degree(j) if page j links to page i, zero otherwise. Without modification, P is reducible (the web has dead-end pages) and the iteration fails to converge. The fix is to mix in a small uniform teleportation:

M = d · P + (1 - d) · (1/n) · 11^T     where d = 0.85

The damping factor d = 0.85 means the surfer follows a link with probability 0.85 and jumps to a random page with probability 0.15. This makes M strictly positive (Perron–Frobenius applies) with a guaranteed dominant eigenvalue of 1 and a second eigenvalue exactly d = 0.85. Power iteration on M converges at rate 0.85ᵏ — about 50 iterations gives four-digit accuracy.

In practice you don't form the dense M (10⁹ × 10⁹ is unstoreable). You apply it implicitly using the sparse link matrix and a tiny teleportation correction. The per-iteration cost is one sparse matrix-vector multiply: 10¹⁰ floating-point operations on the early web (a handful of edges per page × a billion pages). Each iteration was minutes on Google's data centre in 1998; convergence was overnight. Today the algorithm is updated incrementally rather than recomputed from scratch, but the underlying iteration is still power iteration.

The Rayleigh quotient and shift-and-invert

The Rayleigh quotient ρ(v) = vᵀA v / vᵀv is the value of λ that best fits Av ≈ λv in a least-squares sense. For symmetric A it's the unique critical-point characterisation: eigenvectors are exactly the critical points of ρ. This gives Rayleigh quotient iteration, where you replace the shift σ in shift-and-invert by the current Rayleigh estimate at every step:

solve (A - ρ(v_k) I) w = v_k
v_{k+1} = w / ‖w‖

Rayleigh quotient iteration converges cubically near an eigenvector (each step roughly triples the digit count), at the cost of an expensive linear solve per iteration. It's the locally fastest single-vector eigenmethod known. Production codes like LAPACK's tridiagonal eigenvalue routine use it as the inner accelerator after Lanczos provides good initial guesses.

Where power iteration shows up

  • Google PageRank. The web's link graph eigenvector. Used in ranking until 2009, when it was replaced by more diversified signals — but the underlying iteration is still inside many graph-ranking systems including Twitter's WhoToFollow.
  • Dominant-mode analysis. Fluid dynamics' proper orthogonal decomposition, structural vibration analysis, neural network's largest singular vector (used in spectral normalisation) — all run power iteration as the inner loop.
  • Markov chain stationary distributions. The eigenvector of P with eigenvalue 1. Power iteration converges to the stationary distribution at rate equal to the second-largest eigenvalue — the spectral gap. Used in Monte Carlo mixing-time analysis.
  • Spectral clustering. The k-largest eigenvectors of a graph Laplacian. Lanczos with restarts is the modern choice, but for k = 1 (just the Fiedler vector or its complement) plain power iteration suffices.
  • Inner loop of ARPACK and SLEPc. Production sparse eigensolvers run implicitly-restarted Arnoldi, which builds a Krylov subspace and re-starts when memory fills. Each Arnoldi step is essentially a power iteration with explicit orthogonalisation against previous Krylov vectors.
  • Randomised SVD. Modern matrix-sketching algorithms apply A to a random Gaussian matrix and power-iterate two or three times before QR. The power iterations sharpen the singular value gap, dramatically improving approximation accuracy.

Common pitfalls

  • Starting in the wrong subspace. If v₀ happens to be orthogonal to x₁ (so c₁ = 0), the iterate converges to the second eigenvector instead. Use a random start with float-randomness to make c₁ = 0 a measure-zero event.
  • Complex dominant eigenvalues. Real non-symmetric matrices often have complex conjugate dominant pairs. The iterate rotates in the 2D invariant subspace and never converges. Detect this by tracking the angle change per step; switch to block iteration if it persists.
  • Small spectral gap. If |λ₂ / λ₁| ≈ 1, convergence is glacial. Accelerate with shift-and-invert (target λ₁ via (A - σI)⁻¹) or use Chebyshev acceleration that filters out the unwanted spectrum.
  • Forgetting to renormalise. Without renormalisation, ‖vₖ‖ grows as |λ₁|ᵏ — overflow in a few iterations for |λ₁| > 1, underflow for |λ₁| < 1. Renormalise every step or every few steps.
  • Using ‖Av‖ as the eigenvalue. The Rayleigh quotient vᵀA v is a better estimator — it has the correct sign, converges faster for symmetric A, and gives a meaningful error bound via the residual ‖Av - ρv‖.

Why power iteration endures

Power iteration is the simplest iterative eigenmethod. Six lines, O(N) memory, one matrix-vector multiply per step, and it converges geometrically as long as the matrix has a unique dominant eigenvalue. That combination — sparsity-friendly, parallel-friendly, low-memory — is what made it the engine of PageRank and what keeps it at the bottom of every Krylov solver.

More sophisticated algorithms (Arnoldi, Lanczos, QR) extract more information per matrix-vector multiply, but every one of them either reduces to power iteration at k = 1 or uses it as a building block. The convergence theory generalises (Krylov subspaces, polynomial acceleration), but the core idea — repeated multiplication amplifies the dominant component — is unchanged from von Mises and Pollaczek-Geiringer's 1929 paper that introduced it.

Frequently asked questions

Why does the iterate align with the dominant eigenvector?

Decompose the starting vector v₀ in the eigenbasis: v₀ = c₁ x₁ + c₂ x₂ + ... + cₙ xₙ. Applying A multiplies each eigencomponent by its eigenvalue: Aᵏ v₀ = c₁ λ₁ᵏ x₁ + c₂ λ₂ᵏ x₂ + .... Pull out λ₁ᵏ: Aᵏ v₀ = λ₁ᵏ (c₁ x₁ + c₂ (λ₂/λ₁)ᵏ x₂ + ...). Because |λᵢ/λ₁| < 1 for i > 1, every term except the first decays geometrically. After enough iterations only the c₁ x₁ direction survives — that's the dominant eigenvector, scaled. Renormalising at each step keeps the iterate from overflowing.

How fast is convergence?

Linearly, at rate |λ₂ / λ₁|ᵏ. If the dominant eigenvalue is 1.0 and the next is 0.9, you gain about log₁₀(1/0.9) ≈ 0.046 decimal digits per iteration — roughly 22 iterations per digit. If the next eigenvalue is 0.5, the rate jumps to log₁₀(2) ≈ 0.30 digits per iteration — about 3 iterations per digit. The "spectral gap" λ₁ - λ₂ is the dial that sets the speed: large gap, fast; small gap, slow. PageRank's gap is engineered to ≈0.15 to guarantee fast convergence on the web graph.

What if there are two equal-magnitude top eigenvalues?

Then plain power iteration fails to converge — the iterate oscillates or rotates between the two dominant eigendirections. Complex conjugate dominant eigenvalues (common for real non-symmetric matrices) cause spinning. Workarounds include shift-and-invert iteration (apply (A - σI)⁻¹ instead of A, which makes the eigenvalue closest to σ dominant), or generalised power methods that converge to invariant subspaces rather than single eigenvectors. Block power iteration tracks several Ritz vectors at once.

How does power iteration relate to PageRank?

PageRank's transition matrix P (with teleportation damping) is a column-stochastic matrix whose dominant eigenvalue is 1. The PageRank vector is the corresponding eigenvector — the stationary distribution of a random surfer. Brin and Page's original 1998 algorithm was literally power iteration on P, run until convergence on a precomputed sparse representation of the web graph. The damping factor d = 0.85 sets the second-largest eigenvalue ≈ 0.85, so each iteration multiplies the error by 0.85 — about 50-100 iterations to four-digit accuracy on the early web.

Why renormalise at every step?

Without renormalisation, ‖Aᵏ v₀‖ grows or shrinks as |λ₁|ᵏ — overflowing or underflowing in finite-precision arithmetic. Dividing by the norm at each step keeps the iterate at unit length, so only the direction evolves. The Rayleigh quotient vᵀA v / vᵀv gives the corresponding eigenvalue estimate even when v is not yet exactly aligned — and it converges twice as fast as the vector itself for symmetric matrices.

What about the smallest eigenvalue, or one in the middle?

Apply power iteration to A⁻¹ instead — its dominant eigenvalue is 1/λ_min(A), so the iteration converges to the eigenvector of the smallest eigenvalue of A. This is inverse iteration. To target an interior eigenvalue near a shift σ, use (A - σI)⁻¹ — shift-and-invert iteration. Solve the linear system at each step instead of explicitly inverting, usually via a precomputed LU factorisation. This trick is the inner loop of practical eigensolvers like ARPACK and the QR algorithm.

When should I prefer something else?

Three cases. First, if you need many eigenvalues (not just the dominant), use the QR algorithm (for dense matrices) or Arnoldi/Lanczos (for sparse). Second, if the spectral gap is small, accelerate with Chebyshev iteration or shift-and-invert. Third, if A is not available explicitly but matrix-vector products Av are cheap, use Lanczos for symmetric or Arnoldi for non-symmetric — they extract m eigenvalues from m matrix-vector multiplications, much better information per operation than plain power iteration.