Computational Number Theory

Sieve of Eratosthenes

Strike out the composites; survivors are prime — 2200-year-old still the fastest enumerator

The Sieve of Eratosthenes finds every prime in [2, N]. Walk the integers from 2 upward; mark each as prime; cross out all its multiples (2p, 3p, 4p, …) up to N as composite. Stop once you reach √N — any composite > √N already has a marked-prime factor below √N. Time complexity O(N log log N); space O(N). Discovered by Eratosthenes of Cyrene around 240 BCE and still the fastest way to enumerate every prime ≤ 109 on a single CPU.

  • AuthorEratosthenes of Cyrene, ~240 BCE
  • TimeO(N log log N)
  • SpaceO(N) bits (bitset)
  • Stop atp ≤ √N
  • Output for N=10025 primes — 2, 3, 5, 7, …, 97
  • Used inCompetitive programming, primesieve.org, prime tables, Project Euler

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How the sieve works

Write down the integers 2, 3, 4, …, N. Two operations interleave:

  1. Find the smallest unmarked number. Call it p. It must be prime — if it had a smaller prime factor q, q would have already marked p as composite when q's multiples were swept.
  2. Cross out multiples of p. Mark 2p, 3p, 4p, … ≤ N as composite. Move on.

The first iteration picks p = 2, crosses out 4, 6, 8, 10, …. The second picks p = 3, crosses out 6, 9, 12, 15, …. Then p = 5 → 10, 15, 20, …. Then p = 7 → 14, 21, 28, …. After processing p = 7, all remaining unmarked numbers ≤ 100 are prime, because the next unmarked prime is 11 > √100.

Why stop at p ≤ √N?

Suppose n ≤ N is composite. Then n = a · b with both factors ≥ 2. If both were > √N, then a · b > N — contradiction. So at least one factor is ≤ √N. That factor is itself either prime or factors further down to a prime ≤ √N. So every composite n ≤ N has at least one prime factor ≤ √N, and was crossed out when that prime was processed.

Implication: any number still unmarked when the algorithm reaches p > √N must be prime. The main loop runs only for p up to √N — the rest of the integers are read out for free.

The p² optimization

The naive inner loop crosses out 2p, 3p, 4p, … starting at 2p. But 2p was already crossed out when p = 2 swept its multiples. 3p was crossed out when p = 3 swept its multiples. The smallest multiple of p that wasn't already crossed out is p²:

For p = 5:
  Naive — cross out 10, 15, 20, 25, 30, 35, 40, 45, 50, ...
            (10, 15, 20 already crossed by 2 and 3)
  Optimized — cross out 25, 30, 35, 40, 45, 50, ...

This optimization is why the inner loop cost is roughly N/p × (the count of primes ≤ √N), not the (slightly larger) naive bound. The total work becomes Σ over primes p ≤ √N of N/p, which by Mertens' theorem grows like N log log N.

Worked example — primes up to N = 30

Start:  2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

p = 2 (prime). Cross out 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30.
        2 3 ✗ 5 ✗ 7 ✗ 9 ✗ 11 ✗ 13 ✗ 15 ✗ 17 ✗ 19 ✗ 21 ✗ 23 ✗ 25 ✗ 27 ✗ 29 ✗

p = 3 (next unmarked, prime). Cross out 9, 15, 21, 27.
        2 3 ✗ 5 ✗ 7 ✗ ✗ ✗ 11 ✗ 13 ✗ ✗ ✗ 17 ✗ 19 ✗ ✗ ✗ 23 ✗ 25 ✗ ✗ ✗ 29 ✗

p = 5 (next unmarked). p² = 25 ≤ 30; cross out 25.
        2 3 ✗ 5 ✗ 7 ✗ ✗ ✗ 11 ✗ 13 ✗ ✗ ✗ 17 ✗ 19 ✗ ✗ ✗ 23 ✗ ✗ ✗ ✗ ✗ 29 ✗

Next p = 7. p² = 49 > 30. Stop.

Surviving primes ≤ 30:
        2, 3, 5, 7, 11, 13, 17, 19, 23, 29  ← 10 primes

This matches π(30) = 10. Compare to the Prime Number Theorem estimate — 30 / ln(30) ≈ 8.8.

JavaScript implementation

// Classic sieve — returns array of all primes ≤ N
function sieveOfEratosthenes(N) {
  const sieve = new Uint8Array(N + 1);  // 0 = unmarked (prime)
  sieve[0] = sieve[1] = 1;
  for (let p = 2; p * p <= N; p++) {
    if (sieve[p] === 0) {
      // p is prime; cross out multiples starting at p²
      for (let j = p * p; j <= N; j += p) sieve[j] = 1;
    }
  }
  const primes = [];
  for (let i = 2; i <= N; i++) if (sieve[i] === 0) primes.push(i);
  return primes;
}

sieveOfEratosthenes(30);
// [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

sieveOfEratosthenes(100).length;  // 25
sieveOfEratosthenes(1000).length; // 168
sieveOfEratosthenes(1e6).length;  // 78498

// Odds-only sieve — half the memory, ~2× faster
function sieveOdds(N) {
  // bit i represents the odd number 2i + 1 (so 1, 3, 5, 7, ...)
  const halfN = (N - 1) >> 1;
  const sieve = new Uint8Array(halfN + 1);
  for (let i = 1; (2 * i + 1) * (2 * i + 1) <= N; i++) {
    if (sieve[i] === 0) {
      const p = 2 * i + 1;
      for (let j = (p * p - 1) >> 1; j <= halfN; j += p) sieve[j] = 1;
    }
  }
  const primes = N >= 2 ? [2] : [];
  for (let i = 1; i <= halfN; i++) if (sieve[i] === 0) primes.push(2 * i + 1);
  return primes;
}

Variants and comparison

AlgorithmTimeSpaceYearUse case
Sieve of EratosthenesO(N log log N)O(N) bits~240 BCEDefault — enumerate primes ≤ ~10⁹
Sieve of SundaramO(N log N)O(N) bits1934Pedagogical — generates odd primes via 2i + 1
Sieve of AtkinO(N / log log N)O(N) bits2003Faster asymptotically; in practice slower up to ~10¹²
Segmented EratosthenesO(N log log N)O(√N) bitsModernN > 10⁹ — fits cache, used by primesieve.org
Linear sieveO(N)O(N)ModernEach composite crossed out exactly once; gives smallest prime factor table
Wheel-30 sieveO(N log log N) (smaller constant)O(N) bitsModernSkip multiples of 2, 3, 5 — 8 candidates per 30 numbers
Miller-Rabin (per-number)O(log³ n) per testO(1)1980Test a single huge n — no enumeration

Common pitfalls

  • Starting the inner loop at 2p instead of p². Correct but wasteful — every composite gets crossed out by all its prime factors. The p² start gives the cleaner complexity analysis and is essentially free to write.
  • Forgetting to mark 0 and 1 as composite. They're not prime; if you skip this, the output array starts with 0 and 1.
  • Using N + 1 sized array, indexing to N. Easy off-by-one. Allocate new Uint8Array(N + 1) and index 0..N inclusive.
  • 32-bit overflow in p * p. For N = 10⁹, √N ≈ 31623; (31623)² ≈ 10⁹ fits in 32-bit. For N near 2³¹, p² can overflow. Use 64-bit indices or check p <= N / p instead.
  • Naive sieve for N > 10⁹. A 10⁹-byte sieve is 1 GB of RAM — doable, but slow. For larger N, use segmented sieve (~64 KB per segment fits L2).
  • Confusing sieve with primality testing. The sieve enumerates every prime up to N. To test whether a specific (huge) number is prime, you want Miller-Rabin, not the sieve.
  • Believing "every composite ≤ N gets crossed exactly once." In the classical sieve, every composite c gets crossed out once per distinct prime factor. The linear sieve variant arranges to cross each composite exactly once, at the cost of slightly more bookkeeping.

Applications

  • Competitive programming. Pre-compute every prime ≤ 106 or 107 in < 100 ms; reuse the table for every primality query in a contest problem.
  • Project Euler and number-theory puzzles. Half of the first 100 problems touch prime enumeration; the sieve is the universal first step.
  • Pre-filter for Miller-Rabin. RSA-key generation routines pre-screen candidates against the first few thousand primes (sieved offline) before running probabilistic tests.
  • primesieve.org / primecount. The standard primality-table software uses a heavily-tuned segmented Eratosthenes; has enumerated π(N) for N = 1027.
  • Goldbach-conjecture verification. Verifying Goldbach's conjecture up to 4 · 1018 required sieving every prime in that range — distributed Eratosthenes is the underlying engine.
  • Smallest-prime-factor tables. A linear-sieve variant outputs not just primality but the smallest prime factor of every n ≤ N — enabling O(log n) factorization queries.

A bit of history

Eratosthenes of Cyrene (276-194 BCE) was the chief librarian at Alexandria. He calculated Earth's circumference within a few percent of its true value using the angles of the sun in Syene and Alexandria. The sieve is named after him though earlier descriptions hint at the idea; Nicomachus mentions it in Introduction to Arithmetic (~100 CE).

The algorithm sat unchallenged for two millennia. Sundaram's 1934 variant gave a slightly different formulation. Atkin and Bernstein in 2003 produced the first asymptotic improvement — but in practice, a well-tuned Eratosthenes is still the algorithm of choice up to N ≈ 1012.

Frequently asked questions

Why stop crossing out at √N?

Suppose n ≤ N is composite. Then n = a · b with both factors ≥ 2. They can't both exceed √N, else a · b > N. So at least one factor ≤ √N. That factor is itself either prime or has a prime factor ≤ √N. Either way, a prime ≤ √N divides n — so n was already crossed out by the time we processed that prime. Anything still unmarked when p > √N must be prime.

Why start crossing out from p² and not 2p?

Each composite c gets crossed out multiple times in the naive version (once for every prime factor). Starting from p² skips the multiples that have a smaller prime factor — they've already been marked. For p = 7, you start crossing at 49, not at 14 (already × by 2), 21 (× by 3), 35 (× by 5). This is what makes the inner-loop savings amount to the O(N log log N) bound versus O(N log N) for naive.

What's the actual complexity proof?

The total work is Σ over primes p ≤ √N of N/p. By Mertens' theorem (1874), Σ 1/p over primes p ≤ x is asymptotically ln(ln x). So total work ≈ N · ln(ln √N) = O(N log log N). For N = 109, log log N ≈ 3 — so a billion-prime sieve does ~3 billion writes; runs in ~1 second on modern hardware with bit-packed storage.

How does segmented sieve work?

Process N in chunks of size ~√N each, fitting in L1/L2 cache. First sieve [2, √N] normally. Then for each segment [L, R], precompute the starting multiple of every prime p ≤ √N inside [L, R] and mark composites. Reuses the small-prime list; never touches numbers outside the active segment. The standard implementation when N exceeds ~109 — primesieve.org uses this to enumerate primes up to 1022 in hours.

How does this compare to Miller-Rabin?

Sieve enumerates every prime ≤ N — output is huge but each prime is provably prime. Miller-Rabin tests a specific (potentially huge) number probabilistically. Sieve scales to N ≈ 109-1012 on a single machine; Miller-Rabin scales to any size n (millions of digits) but only answers about one n at a time. Use sieve to populate a prime table; use Miller-Rabin to screen RSA candidates.

What's the sieve of Atkin?

An improvement by Atkin and Bernstein (2003) running in O(N / log log N) — asymptotically faster than Eratosthenes. Uses quadratic forms (4x² + y², 3x² + y², 3x² − y²) to flag candidates. The theoretical improvement is real but constants are worse: in practice, a tuned segmented Eratosthenes is faster up to N ≈ 1012 on x86-64. Atkin matters more on architectures where memory writes are very expensive.

What about parity tricks — can you skip even numbers entirely?

Yes — the standard optimization. Allocate a bitset for odd numbers only, halving memory. After identifying 2, only test odd p and cross out only odd multiples of p. A wheel-3 sieve also skips multiples of 3 (every 6th number is candidate); wheel-30 skips 2, 3, 5 (8 of every 30 are candidates). All preserve O(N log log N) but cut constants by 2-4×. primesieve.org wheels up to 210 (skips 2, 3, 5, 7).