Computational Number Theory
Sieve of Eratosthenes
Strike out the composites; survivors are prime — 2200-year-old still the fastest enumerator
The Sieve of Eratosthenes finds every prime in [2, N]. Walk the integers from 2 upward; mark each as prime; cross out all its multiples (2p, 3p, 4p, …) up to N as composite. Stop once you reach √N — any composite > √N already has a marked-prime factor below √N. Time complexity O(N log log N); space O(N). Discovered by Eratosthenes of Cyrene around 240 BCE and still the fastest way to enumerate every prime ≤ 109 on a single CPU.
- AuthorEratosthenes of Cyrene, ~240 BCE
- TimeO(N log log N)
- SpaceO(N) bits (bitset)
- Stop atp ≤ √N
- Output for N=10025 primes — 2, 3, 5, 7, …, 97
- Used inCompetitive programming, primesieve.org, prime tables, Project Euler
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How the sieve works
Write down the integers 2, 3, 4, …, N. Two operations interleave:
- Find the smallest unmarked number. Call it p. It must be prime — if it had a smaller prime factor q, q would have already marked p as composite when q's multiples were swept.
- Cross out multiples of p. Mark 2p, 3p, 4p, … ≤ N as composite. Move on.
The first iteration picks p = 2, crosses out 4, 6, 8, 10, …. The second picks p = 3, crosses out 6, 9, 12, 15, …. Then p = 5 → 10, 15, 20, …. Then p = 7 → 14, 21, 28, …. After processing p = 7, all remaining unmarked numbers ≤ 100 are prime, because the next unmarked prime is 11 > √100.
Why stop at p ≤ √N?
Suppose n ≤ N is composite. Then n = a · b with both factors ≥ 2. If both were > √N, then a · b > N — contradiction. So at least one factor is ≤ √N. That factor is itself either prime or factors further down to a prime ≤ √N. So every composite n ≤ N has at least one prime factor ≤ √N, and was crossed out when that prime was processed.
Implication: any number still unmarked when the algorithm reaches p > √N must be prime. The main loop runs only for p up to √N — the rest of the integers are read out for free.
The p² optimization
The naive inner loop crosses out 2p, 3p, 4p, … starting at 2p. But 2p was already crossed out when p = 2 swept its multiples. 3p was crossed out when p = 3 swept its multiples. The smallest multiple of p that wasn't already crossed out is p²:
For p = 5:
Naive — cross out 10, 15, 20, 25, 30, 35, 40, 45, 50, ...
(10, 15, 20 already crossed by 2 and 3)
Optimized — cross out 25, 30, 35, 40, 45, 50, ...
This optimization is why the inner loop cost is roughly N/p × (the count of primes ≤ √N), not the (slightly larger) naive bound. The total work becomes Σ over primes p ≤ √N of N/p, which by Mertens' theorem grows like N log log N.
Worked example — primes up to N = 30
Start: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
p = 2 (prime). Cross out 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30.
2 3 ✗ 5 ✗ 7 ✗ 9 ✗ 11 ✗ 13 ✗ 15 ✗ 17 ✗ 19 ✗ 21 ✗ 23 ✗ 25 ✗ 27 ✗ 29 ✗
p = 3 (next unmarked, prime). Cross out 9, 15, 21, 27.
2 3 ✗ 5 ✗ 7 ✗ ✗ ✗ 11 ✗ 13 ✗ ✗ ✗ 17 ✗ 19 ✗ ✗ ✗ 23 ✗ 25 ✗ ✗ ✗ 29 ✗
p = 5 (next unmarked). p² = 25 ≤ 30; cross out 25.
2 3 ✗ 5 ✗ 7 ✗ ✗ ✗ 11 ✗ 13 ✗ ✗ ✗ 17 ✗ 19 ✗ ✗ ✗ 23 ✗ ✗ ✗ ✗ ✗ 29 ✗
Next p = 7. p² = 49 > 30. Stop.
Surviving primes ≤ 30:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ← 10 primes
This matches π(30) = 10. Compare to the Prime Number Theorem estimate — 30 / ln(30) ≈ 8.8.
JavaScript implementation
// Classic sieve — returns array of all primes ≤ N
function sieveOfEratosthenes(N) {
const sieve = new Uint8Array(N + 1); // 0 = unmarked (prime)
sieve[0] = sieve[1] = 1;
for (let p = 2; p * p <= N; p++) {
if (sieve[p] === 0) {
// p is prime; cross out multiples starting at p²
for (let j = p * p; j <= N; j += p) sieve[j] = 1;
}
}
const primes = [];
for (let i = 2; i <= N; i++) if (sieve[i] === 0) primes.push(i);
return primes;
}
sieveOfEratosthenes(30);
// [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
sieveOfEratosthenes(100).length; // 25
sieveOfEratosthenes(1000).length; // 168
sieveOfEratosthenes(1e6).length; // 78498
// Odds-only sieve — half the memory, ~2× faster
function sieveOdds(N) {
// bit i represents the odd number 2i + 1 (so 1, 3, 5, 7, ...)
const halfN = (N - 1) >> 1;
const sieve = new Uint8Array(halfN + 1);
for (let i = 1; (2 * i + 1) * (2 * i + 1) <= N; i++) {
if (sieve[i] === 0) {
const p = 2 * i + 1;
for (let j = (p * p - 1) >> 1; j <= halfN; j += p) sieve[j] = 1;
}
}
const primes = N >= 2 ? [2] : [];
for (let i = 1; i <= halfN; i++) if (sieve[i] === 0) primes.push(2 * i + 1);
return primes;
}
Variants and comparison
| Algorithm | Time | Space | Year | Use case |
|---|---|---|---|---|
| Sieve of Eratosthenes | O(N log log N) | O(N) bits | ~240 BCE | Default — enumerate primes ≤ ~10⁹ |
| Sieve of Sundaram | O(N log N) | O(N) bits | 1934 | Pedagogical — generates odd primes via 2i + 1 |
| Sieve of Atkin | O(N / log log N) | O(N) bits | 2003 | Faster asymptotically; in practice slower up to ~10¹² |
| Segmented Eratosthenes | O(N log log N) | O(√N) bits | Modern | N > 10⁹ — fits cache, used by primesieve.org |
| Linear sieve | O(N) | O(N) | Modern | Each composite crossed out exactly once; gives smallest prime factor table |
| Wheel-30 sieve | O(N log log N) (smaller constant) | O(N) bits | Modern | Skip multiples of 2, 3, 5 — 8 candidates per 30 numbers |
| Miller-Rabin (per-number) | O(log³ n) per test | O(1) | 1980 | Test a single huge n — no enumeration |
Common pitfalls
- Starting the inner loop at 2p instead of p². Correct but wasteful — every composite gets crossed out by all its prime factors. The p² start gives the cleaner complexity analysis and is essentially free to write.
- Forgetting to mark 0 and 1 as composite. They're not prime; if you skip this, the output array starts with 0 and 1.
- Using N + 1 sized array, indexing to N. Easy off-by-one. Allocate
new Uint8Array(N + 1)and index 0..N inclusive. - 32-bit overflow in p * p. For N = 10⁹, √N ≈ 31623; (31623)² ≈ 10⁹ fits in 32-bit. For N near 2³¹, p² can overflow. Use 64-bit indices or check
p <= N / pinstead. - Naive sieve for N > 10⁹. A 10⁹-byte sieve is 1 GB of RAM — doable, but slow. For larger N, use segmented sieve (~64 KB per segment fits L2).
- Confusing sieve with primality testing. The sieve enumerates every prime up to N. To test whether a specific (huge) number is prime, you want Miller-Rabin, not the sieve.
- Believing "every composite ≤ N gets crossed exactly once." In the classical sieve, every composite c gets crossed out once per distinct prime factor. The linear sieve variant arranges to cross each composite exactly once, at the cost of slightly more bookkeeping.
Applications
- Competitive programming. Pre-compute every prime ≤ 106 or 107 in < 100 ms; reuse the table for every primality query in a contest problem.
- Project Euler and number-theory puzzles. Half of the first 100 problems touch prime enumeration; the sieve is the universal first step.
- Pre-filter for Miller-Rabin. RSA-key generation routines pre-screen candidates against the first few thousand primes (sieved offline) before running probabilistic tests.
- primesieve.org / primecount. The standard primality-table software uses a heavily-tuned segmented Eratosthenes; has enumerated π(N) for N = 1027.
- Goldbach-conjecture verification. Verifying Goldbach's conjecture up to 4 · 1018 required sieving every prime in that range — distributed Eratosthenes is the underlying engine.
- Smallest-prime-factor tables. A linear-sieve variant outputs not just primality but the smallest prime factor of every n ≤ N — enabling O(log n) factorization queries.
A bit of history
Eratosthenes of Cyrene (276-194 BCE) was the chief librarian at Alexandria. He calculated Earth's circumference within a few percent of its true value using the angles of the sun in Syene and Alexandria. The sieve is named after him though earlier descriptions hint at the idea; Nicomachus mentions it in Introduction to Arithmetic (~100 CE).
The algorithm sat unchallenged for two millennia. Sundaram's 1934 variant gave a slightly different formulation. Atkin and Bernstein in 2003 produced the first asymptotic improvement — but in practice, a well-tuned Eratosthenes is still the algorithm of choice up to N ≈ 1012.
Frequently asked questions
Why stop crossing out at √N?
Suppose n ≤ N is composite. Then n = a · b with both factors ≥ 2. They can't both exceed √N, else a · b > N. So at least one factor ≤ √N. That factor is itself either prime or has a prime factor ≤ √N. Either way, a prime ≤ √N divides n — so n was already crossed out by the time we processed that prime. Anything still unmarked when p > √N must be prime.
Why start crossing out from p² and not 2p?
Each composite c gets crossed out multiple times in the naive version (once for every prime factor). Starting from p² skips the multiples that have a smaller prime factor — they've already been marked. For p = 7, you start crossing at 49, not at 14 (already × by 2), 21 (× by 3), 35 (× by 5). This is what makes the inner-loop savings amount to the O(N log log N) bound versus O(N log N) for naive.
What's the actual complexity proof?
The total work is Σ over primes p ≤ √N of N/p. By Mertens' theorem (1874), Σ 1/p over primes p ≤ x is asymptotically ln(ln x). So total work ≈ N · ln(ln √N) = O(N log log N). For N = 109, log log N ≈ 3 — so a billion-prime sieve does ~3 billion writes; runs in ~1 second on modern hardware with bit-packed storage.
How does segmented sieve work?
Process N in chunks of size ~√N each, fitting in L1/L2 cache. First sieve [2, √N] normally. Then for each segment [L, R], precompute the starting multiple of every prime p ≤ √N inside [L, R] and mark composites. Reuses the small-prime list; never touches numbers outside the active segment. The standard implementation when N exceeds ~109 — primesieve.org uses this to enumerate primes up to 1022 in hours.
How does this compare to Miller-Rabin?
Sieve enumerates every prime ≤ N — output is huge but each prime is provably prime. Miller-Rabin tests a specific (potentially huge) number probabilistically. Sieve scales to N ≈ 109-1012 on a single machine; Miller-Rabin scales to any size n (millions of digits) but only answers about one n at a time. Use sieve to populate a prime table; use Miller-Rabin to screen RSA candidates.
What's the sieve of Atkin?
An improvement by Atkin and Bernstein (2003) running in O(N / log log N) — asymptotically faster than Eratosthenes. Uses quadratic forms (4x² + y², 3x² + y², 3x² − y²) to flag candidates. The theoretical improvement is real but constants are worse: in practice, a tuned segmented Eratosthenes is faster up to N ≈ 1012 on x86-64. Atkin matters more on architectures where memory writes are very expensive.
What about parity tricks — can you skip even numbers entirely?
Yes — the standard optimization. Allocate a bitset for odd numbers only, halving memory. After identifying 2, only test odd p and cross out only odd multiples of p. A wheel-3 sieve also skips multiples of 3 (every 6th number is candidate); wheel-30 skips 2, 3, 5 (8 of every 30 are candidates). All preserve O(N log log N) but cut constants by 2-4×. primesieve.org wheels up to 210 (skips 2, 3, 5, 7).