Algorithms
Matrix Exponentiation
Skip a billion steps of a recurrence in about thirty multiplications
Matrix exponentiation raises a linear recurrence's transition matrix to the nth power in O(k³ log n) time, jumping billions of steps ahead in a sequence like Fibonacci with only about 30 matrix multiplications instead of a billion additions.
- Time (k-term recurrence)O(k³ log n)
- Time (Fibonacci, k=2)O(log n)
- Multiplications to reach Mⁿ≤ 2 log₂ n
- SpaceO(k²)
- Fibonacci matrix[[1,1],[1,0]]
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The idea: jump, don't step
Computing the billionth Fibonacci number the obvious way means a billion additions — one per term. That's the trap matrix exponentiation springs you from. The insight has two halves. First, a linear recurrence can be rewritten as one matrix that advances the sequence by a single step. Second, advancing n steps is the same as multiplying by that matrix n times — which is just raising it to the nth power — and powers can be reached by repeated squaring instead of repeated multiplication.
Take Fibonacci: F(n+1) = F(n) + F(n−1). Pack the two most recent terms into a column vector and find the matrix that produces the next state:
| 1 1 | | F(n) | | F(n) + F(n-1) | | F(n+1) |
| | · | | = | | = | |
| 1 0 | | F(n-1) | | F(n) | | F(n) |
So one multiply by T = [[1,1],[1,0]] shifts the window forward one term. Apply it n times to the base state and you've walked n steps:
| F(n+1) | | 1 1 |^n | F(1) | | 1 |
| | = | | · | | , with base | | = | |
| F(n) | | 1 0 | | F(0) | | 0 |
It's even cleaner than that: T^n itself has the Fibonacci numbers in its cells — T^n = [[F(n+1), F(n)], [F(n), F(n−1)]]. The whole game is now to compute one matrix power fast.
Repeated squaring: the O(log n) trick
Multiplying T by itself n−1 times is still O(n) — no help. The speedup comes from binary (or "fast") exponentiation, the same idea that powers modular exponentiation in RSA. Write the exponent in binary and notice that you can build any power from a chain of squarings:
T^1, T^2 = (T^1)^2, T^4 = (T^2)^2, T^8 = (T^4)^2, ...
To get T^13, write 13 = 1101₂ = 8 + 4 + 1, then multiply the squarings whose bit is set: T^13 = T^8 · T^4 · T^1. You produce ⌊log₂ n⌋ squarings and, for the set bits, up to another ⌊log₂ n⌋ multiplications — at most 2·log₂ n matrix multiplications total.
For n = 1,000,000,000 that's about 30 squarings and at most 30 more multiplies — roughly 60 matrix operations versus a billion additions. Each operation on a fixed-size 2×2 matrix is constant work, so Fibonacci collapses to O(log n).
From Fibonacci to any linear recurrence
The technique generalizes to any recurrence of the form a(n) = c₁·a(n−1) + c₂·a(n−2) + … + c_k·a(n−k) — a degree-k linear recurrence with constant coefficients. Build a k×k companion matrix: the top row holds the coefficients c₁ … c_k, and a sub-diagonal of 1s shifts the older terms down by one slot.
state = [a(n), a(n-1), ..., a(n-k+1)]ᵀ
| c1 c2 c3 ... ck |
| 1 0 0 ... 0 |
T = | 0 1 0 ... 0 | (k x k)
| ... |
| 0 0 ... 1 0 |
This covers a huge family of problems: counting tilings of a 2×n board, counting strings of length n with no two adjacent equal characters, paths of length n in a graph (raise the adjacency matrix to the nth power), and any "how many ways after n steps" question where the next state depends linearly on a fixed window of previous states. You can even fold in an inhomogeneous constant term by adding a row that carries a 1.
When to reach for it
- Astronomically large
n. When the index is up to10¹⁸and a linear scan would never finish, thelog nbound is the only thing that fits in the time limit. - Small fixed window
k. The recurrence depends on the lastkterms only, andkis small (typically≤ 100) so thek³factor stays cheap. - Counting under a modulus. Most contest problems want the answer mod a prime, which keeps the entries bounded and the multiplies
O(1). - Graph walks. The number of length-
nwalks between two vertices is an entry of the adjacency matrix raised to thenth power.
Skip it when the recurrence is non-linear, when you need every term up to n (a single linear pass wins), or when n is small enough that the k³ constant outweighs the savings.
Matrix exponentiation vs the alternatives
| Matrix exp. | Linear DP scan | Fast doubling | Naïve recursion | Binet's formula | |
|---|---|---|---|---|---|
| Time to get the nth term | O(k³ log n) | O(k·n) | O(log n) | O(φⁿ) exponential | O(1)* |
| Space | O(k²) | O(1) rolling | O(log n) stack | O(n) stack | O(1) |
| Handles general k-term recurrence | Yes | Yes | No (Fibonacci-style only) | Yes | No |
| Works modulo m | Yes | Yes | Yes | Yes | No (irrational √5) |
| Exact for arbitrary precision | Yes (big ints) | Yes | Yes | Yes | No (float rounding fails) |
| Constant factor | Medium (8 mults/2×2) | Tiny | Small | Catastrophic | Tiny but wrong |
| Typical use | Huge n, contest counting | Need all terms | Just Fibonacci, fast | Teaching cautionary tale | Approximate, small n |
*Binet's formula F(n) = (φⁿ − ψⁿ)/√5 is O(1) in theory but uses floating-point powers of an irrational; rounding error makes it wrong beyond about F(70) in IEEE-754 double precision, so it's a curiosity, not a production tool.
What the numbers actually say
- ~30 squarings for
n = 10⁹.log₂(10⁹) ≈ 29.9, so you do 29 squarings plus up to 29 multiplies for the set bits — call it 60 matrix multiplications, versus 999,999,999 additions for the linear scan. That's a ~16-million-fold reduction in matrix operations. - A 2×2 multiply is 8 scalar multiplies and 4 adds. So Fibonacci(10⁹) mod m is roughly 60 × 12 ≈ 720 scalar operations — well under a microsecond on modern hardware.
- The k³ wall. For a
k = 100recurrence withn = 10¹⁸, one multiply is100³ = 10⁶operations, times~120multiplications =1.2×10⁸operations — fast. But pushkto 1000 and a single multiply becomes10⁹; that's where you'd switch to FFT-based polynomial methods (Kitamasa / Cayley–Hamilton) running inO(k log k · log n). - Exact Fibonacci grows fast.
F(n)has aboutn·log₂φ ≈ 0.69nbits, soF(10⁶)is a ~694,000-bit number — big-integer multiplication dominates, but it's still vastly cheaper than a million big-int additions.
JavaScript implementation
A generic modular matrix power, then Fibonacci on top of it. Using BigInt keeps the modular products from overflowing 53-bit floats.
// Multiply two k×k matrices modulo m.
function matMul(A, B, m) {
const k = A.length;
const C = Array.from({ length: k }, () => new Array(k).fill(0n));
for (let i = 0; i < k; i++) {
for (let t = 0; t < k; t++) {
if (A[i][t] === 0n) continue; // skip zero rows — cheap win
const a = A[i][t];
for (let j = 0; j < k; j++) {
C[i][j] = (C[i][j] + a * B[t][j]) % m;
}
}
}
return C;
}
// k×k identity matrix.
function identity(k) {
return Array.from({ length: k }, (_, i) =>
Array.from({ length: k }, (_, j) => (i === j ? 1n : 0n)));
}
// Raise matrix M to the power n (mod m) by binary exponentiation.
function matPow(M, n, m) {
let result = identity(M.length);
let base = M.map(row => row.slice());
while (n > 0n) {
if (n & 1n) result = matMul(result, base, m); // bit set → fold in
base = matMul(base, base, m); // square
n >>= 1n; // next bit
}
return result;
}
// Fibonacci F(n) mod m via the [[1,1],[1,0]] transition matrix.
function fib(n, m = 1000000007n) {
if (n === 0n) return 0n;
const T = [[1n, 1n], [1n, 0n]];
// T^n = [[F(n+1), F(n)], [F(n), F(n-1)]] — F(n) is the top-right cell.
return matPow(T, n, m)[0][1];
}
console.log(fib(10n)); // 55n
console.log(fib(1000000000n)); // 21n (F(1e9) mod 1e9+7)
The if (A[i][t] === 0n) continue; guard is a real optimization for companion matrices, which are mostly zeros — it turns the dense k³ inner loop into something closer to k² per multiply on the sparse rows.
Python implementation
Python's native big integers make the exact (non-modular) version trivial. Here's both the generic power and a famous variant — counting tilings of a 2×n board, which obeys the very same Fibonacci recurrence.
def mat_mul(A, B, m=None):
k = len(A)
C = [[0] * k for _ in range(k)]
for i in range(k):
for t in range(k):
if A[i][t] == 0:
continue
a = A[i][t]
row = C[i]
for j in range(k):
row[j] += a * B[t][j]
if m is not None:
row[j] %= m
return C
def mat_pow(M, n, m=None):
k = len(M)
result = [[1 if i == j else 0 for j in range(k)] for i in range(k)]
base = [row[:] for row in M]
while n > 0:
if n & 1:
result = mat_mul(result, base, m)
base = mat_mul(base, base, m)
n >>= 1
return result
def fib(n, m=None):
if n == 0:
return 0
T = [[1, 1], [1, 0]]
return mat_pow(T, n, m)[0][1]
# 2 x n domino tilings obey the same recurrence: t(n) = t(n-1) + t(n-2)
def tilings(n):
# t(0)=1, t(1)=1 ⇒ t(n) = F(n+1)
return fib(n + 1)
print(fib(100)) # 354224848179261915075 (exact, no overflow)
print(fib(10**9, 10**9 + 7))# modular answer in microseconds
print(tilings(10)) # 89
One subtlety: mat_mul mutates a row reference (row = C[i]) for speed, and mat_pow copies the base with row[:] so squaring never aliases the caller's matrix — a classic source of silent corruption if you skip the copy.
Variants worth knowing
Fast doubling. For Fibonacci specifically, the 2×2 matrix carries redundant information (it's symmetric and the off-diagonals are equal). Fast doubling strips that out using two identities — F(2k) = F(k)·(2·F(k+1) − F(k)) and F(2k+1) = F(k)² + F(k+1)² — computing the same answer in the same O(log n) steps but with roughly half the scalar multiplications. It's the leanest way to get a single Fibonacci number.
Kitamasa's method and Cayley–Hamilton. When k is large, k³ per multiply hurts. Kitamasa reformulates the problem as polynomial exponentiation modulo the recurrence's characteristic polynomial, and with FFT-based polynomial multiplication it runs in O(k log k · log n) — a big win once k is in the hundreds or thousands.
Inhomogeneous recurrences. A recurrence with an added constant, like a(n) = a(n−1) + a(n−2) + 7, becomes linear again if you grow the state vector by one slot that always holds the constant and add a row to the matrix that carries it forward. Same trick handles polynomial-in-n driving terms by stacking extra "counter" rows.
Adjacency-matrix powers. If A is a graph's adjacency matrix, (Aⁿ)[i][j] counts walks of exactly length n from i to j. The exact same matPow code answers "how many length-10¹² paths?" in log time.
Common bugs and edge cases
- Aliasing during squaring. Computing
base = matMul(base, base)in place overwrites entries you still need to read. Always write into a fresh result matrix, never mutate an operand mid-multiply. - Wrong base vector or off-by-one in the index.
T^ngivesF(n)in the top-right cell withF(0)=0, F(1)=1. If your convention isF(1)=F(2)=1, every index shifts by one — verify againstF(10)=55. - Overflow before the modulo. In a fixed-width language (C++/Java),
a*bwith two values near10⁹overflows 32-bit and even signed 64-bit if products accumulate; cast to a wide type and reduce after every multiply, or use__int128/BigInt. - Forgetting
M⁰ = I. The identity matrix is the correct seed for the accumulator; starting fromMitself silently computesM^(n+1). - Negative or non-integer exponents. Repeated squaring only handles non-negative integer powers. Negative
nneeds the inverse matrix (and its modular inverse under a modulus), which may not even exist if the determinant shares a factor withm. - Using it when n is small. For
n < ~50with a smallk, a plain rolling-variable loop is simpler, has no matrix overhead, and is fast enough — the log-n speedup only pays off at scale.
Frequently asked questions
Why is matrix exponentiation O(log n) and not O(n)?
It reuses squarings. To reach the nth power you square the matrix repeatedly — M, M², M⁴, M⁸ — and multiply together only the squares whose exponents correspond to the 1-bits of n in binary. n has about log₂ n bits, so you do at most 2·log₂ n matrix multiplications instead of n−1.
How do you turn a linear recurrence into a matrix?
Stack the last k terms into a state vector, then write the matrix T whose rows reproduce the recurrence: the top row holds the recurrence coefficients and the rows below it shift the older terms down. For Fibonacci the state is [Fₙ, Fₙ₋₁] and T = [[1,1],[1,0]], because [[1,1],[1,0]]·[Fₙ, Fₙ₋₁]ᵀ = [Fₙ+Fₙ₋₁, Fₙ]ᵀ = [Fₙ₊₁, Fₙ]ᵀ.
What is the time complexity for a k-term recurrence?
O(k³ log n) with schoolbook matrix multiplication, since each of the ~log n multiplications costs k³ scalar operations. The k³ factor means matrix exponentiation only beats a linear scan when n is large relative to k³ — for Fibonacci (k = 2) the crossover is tiny.
Is fast doubling faster than matrix exponentiation for Fibonacci?
Yes, by a constant factor. Fast doubling derives two identities — F(2k) = F(k)·(2F(k+1)−F(k)) and F(2k+1) = F(k)²+F(k+1)² — that compute the same answer in the same O(log n) multiplications but with about half the scalar work and no 2×2 matrix bookkeeping. It is the special case of matrix exponentiation with the redundant entries removed.
How do you keep the numbers from overflowing?
Almost every competitive-programming problem asks for the answer modulo a prime like 1,000,000,007, so you reduce every multiply-add modulo m. If you genuinely need the exact value, the entries grow to about n·log₂φ ≈ 0.69n bits, so you need big integers and the cost picks up an extra factor for multiplying those large numbers.
When is matrix exponentiation the wrong tool?
When the recurrence isn't linear (e.g. it multiplies two earlier terms), when n is small enough that the k³ constant outweighs the log-n savings, or when you need every term up to n — then a single O(n) pass is both simpler and faster than recomputing a power for each index.