Fourier Analysis
Convolution Theorem
F(f ∗ g) = F(f) · F(g) — convolution in time becomes multiplication in frequency
The convolution theorem states that the Fourier transform of a convolution equals the pointwise product of Fourier transforms: F(f ∗ g) = F(f) · F(g). Continuous convolution is (f ∗ g)(t) = ∫_{−∞}^∞ f(τ) g(t − τ) dτ; the discrete version replaces the integral by a sum. Equivalently, multiplication in time becomes convolution in frequency: F(f · g) = F(f) ∗ F(g) (the modulation theorem). The deep statement is that every linear time-invariant system is diagonal in the Fourier basis — each frequency is an eigenvector. Computationally, this turns convolution from an O(n²) operation into O(n log n) via FFT: at n = 10⁶ that is a 50,000× speedup. Foundation of signal processing (audio filtering, EQ, reverb), image processing (Gaussian blur, sharpen, edge detection), polynomial and large-integer multiplication (Schönhage-Strassen, Harvey-van der Hoeven), and solutions to linear PDEs via fundamental solutions.
- TheoremF(f ∗ g) = F(f) · F(g)
- Dual formF(f · g) = F(f) ∗ F(g)
- Direct costO(n²)
- FFT costO(n log n)
- Speedup at n = 10⁶~50,000×
- Used infilters, blur, polynomial mult, PDE
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How it works
Convolution is a "sliding-product integral": at output time t, the value (f ∗ g)(t) is the integral of f(τ) · g(t − τ) over all τ. You flip g (the (t − τ) inside), shift it to position t, multiply pointwise with f, and integrate. As t moves, g slides along, and the integral traces out the convolution as a function of t. Geometrically, convolution measures how much overlap there is between f and a shifted-and-flipped copy of g.
The Fourier transform F(f)(ω) = ∫ f(t) e^(−iωt) dt expresses f as a superposition of complex exponentials e^(iωt). Each exponential is an eigenvector of the time-shift operator — translating an exponential just multiplies it by a phase factor. Convolution is built out of translation and pointwise multiplication, so it preserves the exponential structure. Plugging f ∗ g into the Fourier integral and switching the order of integration:
F(f ∗ g)(ω) = ∫∫ f(τ) g(t − τ) e^(−iωt) dτ dt
= ∫ f(τ) e^(−iωτ) dτ · ∫ g(s) e^(−iωs) ds (substitute s = t − τ)
= F(f)(ω) · F(g)(ω)
The double integral factors completely. Convolution disappears; pointwise multiplication takes its place. This is the most general statement that Fourier diagonalizes translations.
Proof sketch
Assume f, g ∈ L¹(ℝ) (absolutely integrable). Then f ∗ g is also L¹, and (by Fubini) the double integral converges absolutely:
F(f ∗ g)(ω) = ∫_ℝ (f ∗ g)(t) e^(−iωt) dt
= ∫_ℝ [∫_ℝ f(τ) g(t − τ) dτ] e^(−iωt) dt
= ∫∫ f(τ) g(t − τ) e^(−iωt) dτ dt (Fubini: swap order)
= ∫ f(τ) [∫ g(t − τ) e^(−iωt) dt] dτ
= ∫ f(τ) e^(−iωτ) [∫ g(s) e^(−iωs) ds] dτ (s = t − τ; dt = ds)
= F(g)(ω) · ∫ f(τ) e^(−iωτ) dτ
= F(f)(ω) · F(g)(ω)
The exchange of integration order requires absolute integrability, which is why the natural setting is L¹. For tempered distributions in S′(ℝ), more care is needed but the result extends: when convolution is defined, the Fourier theorem holds.
The discrete case is identical with sums replacing integrals. For two length-N sequences x and y, the circular convolution (x ⊛ y)[k] = Σ_n x[n] y[(k − n) mod N] satisfies DFT(x ⊛ y) = DFT(x) · DFT(y) pointwise.
Variants and conventions
- Continuous Fourier transform. F(f)(ω) = ∫ f(t) e^(−iωt) dt; the convolution theorem holds verbatim. Some texts normalize with 1/(2π) or 1/√(2π); this introduces matching scalars in the convolution-multiplication identity.
- Discrete-time Fourier transform (DTFT). X(ω) = Σ x[n] e^(−iωn); convolution theorem holds with continuous frequency ω. Used in signal-processing analysis.
- Discrete Fourier transform (DFT). X[k] = Σ_n x[n] e^(−2πikn/N). DFT computes circular convolution. To use it for linear convolution, zero-pad both signals to length ≥ N + M − 1.
- Fourier series. For 2π-periodic functions, the convolution (f ∗ g)(t) = (1/2π) ∫₀^{2π} f(τ) g(t − τ) dτ has Fourier coefficients ĉₙ(f ∗ g) = ĉₙ(f) · ĉₙ(g).
- Laplace transform. L(f ∗ g)(s) = L(f)(s) · L(g)(s) — the same theorem under Laplace; foundational for solving linear ODEs and circuit analysis.
- z-transform (discrete). Z(x ∗ y)(z) = Z(x)(z) · Z(y)(z) — the discrete analogue; foundational for digital filter analysis and control theory.
- Convolution on groups. For locally compact abelian groups G with Pontryagin dual Ĝ, the same factorization holds: F: L¹(G) → C₀(Ĝ) is an algebra homomorphism from (convolution) to (pointwise product). Specialized to G = ℝ, ℤ, ℝ/2πℤ this recovers the previous cases.
Numerical examples
Example 1 — polynomial multiplication. To multiply p(x) = 1 + 2x + 3x² and q(x) = 4 + 5x + 6x², convolve coefficient arrays [1, 2, 3] and [4, 5, 6]:
(p · q)[0] = 1·4 = 4
(p · q)[1] = 1·5 + 2·4 = 13
(p · q)[2] = 1·6 + 2·5 + 3·4 = 28
(p · q)[3] = 2·6 + 3·5 = 27
(p · q)[4] = 3·6 = 18
p(x) · q(x) = 4 + 13x + 28x² + 27x³ + 18x⁴
For polynomials of degree 10⁶, direct multiplication is 10¹² operations; FFT-based is ~10⁷ — that 50,000× factor is exactly what makes computer algebra systems and number-theoretic libraries practical for large operands.
Example 2 — Gaussian blur. Convolving an image I (N × N pixels) with a Gaussian kernel of radius r costs O(N²r²) directly. Via 2D FFT: O(N² log N), independent of r. For N = 4096 and r = 50, direct convolution is 4 · 10¹⁰ ops; FFT-based is 7 · 10⁸ — about 50× faster. Most graphics and computer-vision libraries use FFT-based blur for r > 20.
Example 3 — solving the heat equation u_t = u_xx with initial data u₀. The fundamental solution is the heat kernel G_t(x) = (1/√(4πt)) exp(−x²/(4t)). The solution is u(t, x) = (G_t ∗ u₀)(x). On the Fourier side: û(t, ξ) = e^(−tξ²) · û₀(ξ). High frequencies are damped exponentially, encoding the smoothing effect of diffusion. The convolution theorem turns a PDE into pointwise multiplication.
Example 4 — audio reverb. A room's acoustic response is modelled by its impulse response h(t). Reverberated audio is y(t) = (x ∗ h)(t). For h with duration 4 seconds at 48 kHz, h has 192,000 samples. Direct convolution per output sample is 192,000 multiplies — at 48 kHz output rate that's 9.2 GFLOPS for real-time. Block-by-block FFT-based convolution (overlap-add) drops this by a factor of ~100, making real-time reverb feasible on modest CPUs.
Common pitfalls
- Forgetting to zero-pad. DFT computes circular convolution. For linear convolution of length-n and length-m sequences, zero-pad both to length n + m − 1 before FFT. Otherwise you get wrap-around artifacts at the boundary.
- Wrong sign / convention. Some libraries define F with e^(+iωt), some with e^(−iωt). The convolution theorem holds for both, but normalization factors (1, 1/√(2π), 1/(2π)) differ. Always check the convention.
- Real-valued FFT but complex output. If the input is real, the FFT has Hermitian symmetry — half the spectrum is redundant. Using a real-FFT (rfft) saves a factor of 2 in storage and ops but requires careful indexing of the symmetric half.
- Numerical precision near zeros. Inverting the convolution theorem (deconvolution) divides by F(g); where F(g) is small (typically high frequencies), noise is amplified. Regularization (Wiener filtering, Tikhonov) is essential.
- Treating 2D as 1D-then-1D. 2D FFT and 2D convolution factor as row-FFT then column-FFT. The convolution theorem still holds 2D-pointwise, but indexing requires care, especially in non-square images.
- FFT crossover point. FFT-based convolution has high constant factors. For very short signals (n < 100), direct convolution is faster despite O(n²) scaling. Production libraries (FFTW, MKL) include automatic algorithm selection.
Where it shows up
- Audio signal processing. Every digital filter (EQ, low-pass, high-pass, notch, comb, all-pass) is a convolution; every FFT-based filtering pipeline applies the convolution theorem. Block-convolution / overlap-add is the standard tool for real-time long-impulse-response convolution (reverb, room simulation).
- Image processing. Gaussian blur, sharpen (unsharp mask), edge detection (Laplacian, Sobel-via-Laplacian-of-Gaussian), bilateral filter — all are 2D convolutions. Large-kernel and frequency-domain operations use the convolution theorem; small kernels stay in time domain for cache locality.
- Computer vision and deep learning. CNN layers are 2D convolutions. Small kernels (3×3) use direct convolution; very large effective kernels (in dilated convolution or large-receptive-field models) sometimes use FFT-based convolution. Wavelet-based architectures use both convolution and the Fourier-multiplication identity.
- Polynomial and large-integer arithmetic. FFT-based multiplication is the asymptotic engine. Schönhage-Strassen (1971), Fürer's algorithm (2007), and Harvey-van der Hoeven (2019) all build on the convolution theorem.
- Linear PDE. Fundamental solutions (heat kernel, wave kernel, Poisson kernel) act by convolution; the convolution theorem turns PDE into multiplication in Fourier space. Same trick solves Schrödinger's equation in free space.
- Probability theory. The PDF of X + Y for independent random variables is the convolution of the individual PDFs. The characteristic function (Fourier transform of the PDF) of a sum is the product of characteristic functions — a direct application of the convolution theorem and the engine behind the proof of the central limit theorem.
- Cryptography and number theory. Modular polynomial multiplication is used in LWE-based cryptography (lattice schemes), and number-theoretic transforms (NTTs — DFT over finite fields) implement the convolution theorem without floating-point error.
- Spectroscopy and physics. Instrumental response is convolved with the true spectrum; deconvolution via division in the frequency domain recovers the underlying signal. Same in seismology, NMR, MRI reconstruction.
Frequently asked questions
Why is convolution in time equivalent to multiplication in frequency?
The Fourier transform decomposes any signal as a sum of sinusoids. A linear time-invariant (LTI) system acts on each sinusoid independently: input frequency ω comes out as the same frequency with amplitude and phase scaled by the system's transfer function H(ω). Convolution in the time domain is exactly the operation that an LTI system performs (impulse-response convolution); on the frequency side this becomes multiplication by H. So F(f ∗ g) = F(f) · F(g) is the statement that an LTI system is diagonal in the Fourier basis — each frequency is an eigenvector. This is the deep reason and explains why nearly everything linear is easier to analyze in frequency.
How does FFT speed up convolution?
Direct convolution of two length-n signals takes O(n²) multiply-adds (n outputs, each a sum of n products). The convolution theorem provides a faster route: FFT(f) costs O(n log n), FFT(g) costs O(n log n), pointwise multiplication of the spectra costs O(n), inverse FFT costs O(n log n). Total: O(n log n) — asymptotically much faster. For n = 10⁶, direct convolution is 10¹² operations; FFT-based is 2 · 10⁷ — a 50,000× speedup. Crossover happens around n = 100; below that direct is faster due to FFT's constant overhead. All modern signal-processing libraries use FFT-based convolution for n ≥ a few hundred.
What's the difference between linear and circular convolution?
Linear convolution of two length-n signals produces output of length 2n − 1 (the full overlap). Circular convolution treats inputs as periodic with period n and outputs length n — wrap-around at the boundary. The DFT computes circular convolution; to get linear convolution you must zero-pad both inputs to length ≥ 2n − 1 before FFT, then truncate. Forgetting to zero-pad is one of the most common bugs in FFT-based convolution: it produces wrap-around artifacts at the start and end of the output. Convolution = multiplication only holds for the matching pair: continuous Fourier ↔ linear convolution; DFT ↔ circular convolution.
How is the convolution theorem used in image processing?
Blurring an image with a Gaussian kernel of radius r is a 2D convolution — O(N²r²) directly for an N×N image. The convolution theorem says: 2D FFT the image, 2D FFT the kernel, multiply pointwise, inverse FFT — total O(N² log N), independent of r. Same trick for sharpening, edge detection (Laplacian), and large-kernel correlations like template matching. The convolution theorem is also why CNNs are implemented as direct convolution for small kernels (3×3, 5×5) and via FFT for large kernels (7×7+) or fully-connected channels. Image deblurring is the inverse: divide by the kernel's Fourier transform (with regularization to avoid amplifying noise at high frequencies).
How does the theorem enable fast polynomial multiplication?
Polynomial multiplication is convolution of coefficient sequences: (a₀ + a₁x + … + a_{n-1}x^{n-1})(b₀ + b₁x + … + b_{m-1}x^{m-1}) has coefficients given by the convolution of (aᵢ) and (bᵢ). Direct multiplication is O(nm); via FFT, O((n + m) log(n + m)). Schönhage-Strassen (1971) uses the same idea recursively for multiplying large integers, achieving O(n log n log log n). Fürer (2007) improved this to O(n log n · 2^{O(log* n)}); Harvey-van der Hoeven (2019) achieved O(n log n) — the conjectured optimal. Every modern library that multiplies polynomials or huge integers uses FFT-based convolution as the engine.
Does the convolution theorem hold for distributions?
Yes, with care. For tempered distributions T ∈ S′(ℝ), the convolution T ∗ φ makes sense when φ is a Schwartz function, and F(T ∗ φ) = F(T) · F(φ). This is how one solves PDEs with the fundamental solution: the heat equation has solution u(t) = G_t ∗ u₀ where G_t is the heat kernel; the convolution theorem in frequency gives F(u) = F(G_t) · F(u₀) = e^{-tξ²} F(u₀). For distributions whose convolution may not be defined (like two delta-trains), one cannot literally convolve, but the Fourier-multiplication side often still makes sense and is taken as the definition. Schwartz's distribution theory works out exactly which convolutions are allowed.
Time-domain vs frequency-domain convolution
| Step | Time domain (direct) | Frequency domain (FFT) | Cost |
|---|---|---|---|
| Setup | arrays length n | zero-pad to ≥ 2n − 1 | O(n) |
| Forward transform | — | FFT(f), FFT(g) | O(n log n) |
| Combine | Σ_τ f(τ) g(t − τ) for each t | pointwise F · G | O(n²) vs O(n) |
| Inverse transform | — | IFFT(F · G) | O(n log n) |
| Total | O(n²) | O(n log n) | — |
| Speedup at n = 10⁶ | 10¹² ops | 2·10⁷ ops | ~50,000× |