Algorithms

Fast Fourier Transform

The O(n²) → O(n log n) trick that changed signal processing

The Fast Fourier Transform computes the discrete Fourier transform in O(n log n) time instead of O(n²). At n = 220, that's ~20 million operations instead of ~1 trillion — a 50,000× speed-up. It powers polynomial multiplication, audio analysis, image compression, and large-integer arithmetic.

  • TimeO(n log n)
  • Naive DFTO(n²)
  • Space (in-place)O(n)
  • Best nPower of 2 (radix-2)
  • First publishedCooley & Tukey 1965

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How the FFT works

The discrete Fourier transform of a sequence x0, …, xn−1 is another sequence X0, …, Xn−1 defined by

X_k = Σ_{j=0..n-1} x_j · ω^{jk},   where ω = e^{-2πi/n}

Computed directly, that's n output entries times n input contributions = n² complex multiplications. The Cooley-Tukey FFT exploits a piece of algebra: ω2 in an n-point DFT equals the primitive root of an n/2-point DFT. So if you split x into its even-indexed and odd-indexed halves, transform each half (recursively), and combine with the right twiddle factors, you get the full transform. The recurrence is

T(n) = 2 T(n/2) + O(n)   →   T(n) = O(n log n)

identical to merge sort. The "combine" step at each level is the famous butterfly: for each k, take E[k] + ωk·O[k] and E[k] − ωk·O[k] as the two outputs. With n = 220 the FFT does ~10·log2(nn ≈ 2·107 complex multiplications versus ~1012 for the naive transform — the constant of 50,000× that justifies an entire chapter of every numerical methods textbook.

Why the FFT is the most important numerical algorithm

The FFT didn't just speed up Fourier analysis — it made entire industries possible. JPEG, MP3, MPEG, HEIC, OFDM, GPS, MRI, radar, oil-and-gas seismology, gravitational-wave detection, integer multiplication of trillion-digit numbers, polynomial multiplication for cryptography, partial-differential-equation solvers in fluid dynamics — every one of these relies on an FFT in the inner loop. Gilbert Strang has called it "the most important numerical algorithm of our lifetime", and the claim is hard to argue with.

FFT vs naive DFT

Naive DFTCooley-Tukey FFT
Time complexityO(n²)O(n log n)
Operations at n = 1024~10⁶~10⁴
Operations at n = 2²⁰~10¹²~2·10⁷
Speed-up at n = 2²⁰~50,000×
Memory (in-place)O(n)O(n)
Floating-point errorO(n · ε)O(log n · ε)
Length restrictionNonePower of 2 (radix-2 variant)

The accuracy advantage is often overlooked: because the FFT does fewer operations, it accumulates less rounding error than the naive DFT despite computing the same transform.

Pseudo-code (recursive Cooley-Tukey)

function fft(x):
    n = length(x)
    if n == 1: return x
    even = fft(x[0::2])
    odd  = fft(x[1::2])
    out = array of length n
    for k in 0..n/2-1:
        t = exp(-2πi · k / n) * odd[k]
        out[k]       = even[k] + t
        out[k + n/2] = even[k] - t
    return out

JavaScript: FFT and polynomial multiplication

// Recursive Cooley-Tukey on a complex array stored as [re, im, re, im, ...]
function fft(re, im) {
  const n = re.length;
  if (n <= 1) return;
  const halfN = n >> 1;

  // Split into even/odd halves.
  const evenRe = new Float64Array(halfN), evenIm = new Float64Array(halfN);
  const oddRe  = new Float64Array(halfN), oddIm  = new Float64Array(halfN);
  for (let i = 0; i < halfN; i++) {
    evenRe[i] = re[2*i];     evenIm[i] = im[2*i];
    oddRe[i]  = re[2*i + 1]; oddIm[i]  = im[2*i + 1];
  }
  fft(evenRe, evenIm);
  fft(oddRe,  oddIm);

  // Butterfly combine.
  for (let k = 0; k < halfN; k++) {
    const angle = -2 * Math.PI * k / n;
    const cos = Math.cos(angle), sin = Math.sin(angle);
    const tRe = cos * oddRe[k] - sin * oddIm[k];
    const tIm = cos * oddIm[k] + sin * oddRe[k];
    re[k]         = evenRe[k] + tRe;  im[k]         = evenIm[k] + tIm;
    re[k + halfN] = evenRe[k] - tRe;  im[k + halfN] = evenIm[k] - tIm;
  }
}

function ifft(re, im) {
  for (let i = 0; i < im.length; i++) im[i] = -im[i]; // conjugate
  fft(re, im);
  const n = re.length;
  for (let i = 0; i < n; i++) { re[i] /= n; im[i] = -im[i] / n; }
}

// Multiply two polynomials given by their coefficient arrays.
// Result length = a.length + b.length - 1.
function polyMul(a, b) {
  let n = 1; while (n < a.length + b.length) n *= 2;
  const re = new Float64Array(n), im = new Float64Array(n);
  const re2 = new Float64Array(n), im2 = new Float64Array(n);
  for (let i = 0; i < a.length; i++) re[i]  = a[i];
  for (let i = 0; i < b.length; i++) re2[i] = b[i];
  fft(re, im); fft(re2, im2);
  for (let i = 0; i < n; i++) {
    const r = re[i]*re2[i] - im[i]*im2[i];
    const j = re[i]*im2[i] + im[i]*re2[i];
    re[i] = r; im[i] = j;
  }
  ifft(re, im);
  const out = new Array(a.length + b.length - 1);
  for (let i = 0; i < out.length; i++) out[i] = Math.round(re[i]);
  return out;
}

Python: same FFT and polynomial multiplication

import cmath

def fft(x):
    n = len(x)
    if n <= 1: return x
    even = fft(x[0::2])
    odd  = fft(x[1::2])
    T = [cmath.exp(-2j * cmath.pi * k / n) * odd[k] for k in range(n // 2)]
    return [even[k] + T[k] for k in range(n // 2)] + \
           [even[k] - T[k] for k in range(n // 2)]

def ifft(X):
    n = len(X)
    conj = [v.conjugate() for v in X]
    y = fft(conj)
    return [v.conjugate() / n for v in y]

def poly_mul(a, b):
    n = 1
    while n < len(a) + len(b):
        n *= 2
    A = [complex(c) for c in a] + [0j] * (n - len(a))
    B = [complex(c) for c in b] + [0j] * (n - len(b))
    FA, FB = fft(A), fft(B)
    FC = [FA[i] * FB[i] for i in range(n)]
    C = ifft(FC)
    return [round(v.real) for v in C[:len(a) + len(b) - 1]]

# Example: (1 + 2x + 3x²) · (4 + 5x) = 4 + 13x + 22x² + 15x³
print(poly_mul([1, 2, 3], [4, 5]))   # => [4, 13, 22, 15]

Variants and modern implementations

  • Cooley-Tukey radix-2. The classical splitting for power-of-2 lengths. Simple to implement, the textbook version.
  • Cooley-Tukey radix-4 / radix-8 / split-radix. Same idea with a larger fan-out. Split-radix has the lowest known operation count among "simple" FFTs — about 33% fewer multiplies than radix-2 at the cost of a more involved butterfly.
  • Mixed-radix. Factorize n = n1·n2·… and apply Cooley-Tukey along each factor. FFTW uses this with hand-tuned codelets for each small factor.
  • Bluestein's chirp-z. Reduces a DFT of arbitrary length n to a length-2n convolution that you compute with a power-of-2 FFT. Always O(n log n); useful when n is prime or has bad factors.
  • Prime-factor (Good-Thomas). When n = n1·n2 with gcd(n1, n2) = 1, the FFT decomposes without any twiddle factors — slightly faster than Cooley-Tukey on coprime factorizations.
  • Iterative bit-reversal FFT. The recursive form is pedagogically clean but slow due to call overhead. The production version is iterative: bit-reverse the input array in place, then run log2(n) butterfly stages.
  • Number-theoretic transform (NTT). The same algebra over a finite field instead of the complex numbers. Exact integer arithmetic — no floating-point error — at the cost of needing a modulus with the right roots of unity. Used in cryptography (Kyber, Dilithium) and exact polynomial multiplication.

Cost in real terms

At n = 1024 the FFT is roughly 100× faster than naive DFT — already worth the implementation effort. At n = 220 ≈ 1 million it's 50,000× faster: the difference between 16 ms and 13 minutes on commodity hardware. At n = 109 it's roughly 30 million times faster — the gap between an interactive plot and a job that won't finish before the next ice age. FFTW on a modern x86 core sustains about 8 GFLOPS for power-of-2 transforms in the L3-cache regime.

Common bugs and edge cases

  • Bit-reversal off-by-one. The iterative FFT permutes the input by reversing the binary representation of each index. Reversing log2(n) − 1 bits instead of log2(n) bits silently produces wrong output and is the canonical bug; double-check by transforming a known signal (e.g. a single impulse).
  • Sign of the exponent. Convention varies: forward DFT can use ω = e−2πi/n (engineering / NumPy / FFTW) or e+2πi/n (some physics texts). Mismatched sign between FFT and inverse FFT silently flips spectra — usually noticed only when convolution gives a time-reversed answer.
  • Forgetting to divide by n on the inverse. The inverse FFT either divides by n at the end or splits a 1/√n between forward and inverse. Missing the normalization scales your output by n.
  • Non-power-of-2 length. Calling a radix-2 FFT on length 1000 (not a power of 2) either crashes or returns garbage. Zero-pad to 1024 or use Bluestein.
  • Aliasing in convolution. Multiplying spectra of length n computes a circular convolution of length n. To compute a linear convolution of two sequences of length m, zero-pad to length ≥ 2m.
  • Floating-point rounding when expecting integers. Polynomial multiplication produces almost-integer real outputs; round only after verifying the imaginary part is below tolerance, or you'll silently throw away the bug-flagging signal that the computation went off the rails.

Frequently asked questions

Why is FFT log-linear?

Cooley-Tukey splits an n-point DFT into two n/2-point DFTs (even-indexed and odd-indexed inputs), then combines them with n complex multiplications. The recurrence T(n) = 2T(n/2) + O(n) solves to O(n log n) — the same recurrence as merge sort.

How much faster than naive DFT?

For n = 2^20 ≈ 1,048,576, naive DFT does ~10^12 complex multiplications. FFT does ~20·n ≈ 2 × 10^7 — a 50,000× speed-up. The gap widens with n: at n = 10^9 the FFT is roughly 30 million times faster.

Does the input length have to be a power of two?

The classical radix-2 Cooley-Tukey assumes it. For arbitrary n you have three options: zero-pad to the next power of two (simple, slightly less accurate), use a mixed-radix FFT (FFTW does this), or use Bluestein's algorithm (always O(n log n) at the cost of larger constants).

Is FFT exact or approximate?

Mathematically exact. In floating-point arithmetic it accumulates rounding error of about O(log n · ε) where ε is machine epsilon — much better than the O(n · ε) error of naive DFT. For integer convolution (e.g. polynomial multiplication of bounded coefficients), use a number-theoretic transform to get exact results.

What is the butterfly diagram?

The graph of computations in one stage of an FFT: pairs of inputs cross, each multiplied by a twiddle factor and combined by addition and subtraction. Drawing it on graph paper produces an X-shape per pair — hence the name butterfly. The full FFT has log₂(n) stages of n/2 butterflies each.

What is FFT used for in practice?

Signal analysis (spectrograms, audio EQ), image processing (JPEG and HEIC use the related DCT), data compression, polynomial multiplication, large-integer multiplication (Schönhage-Strassen), partial differential equation solvers, MRI reconstruction, and convolutional layer acceleration. It is arguably the most important numerical algorithm of the 20th century.