Algorithms

The Sieve of Eratosthenes

Q: Why does the Sieve of Eratosthenes start crossing out at p² instead of 2p?

Every composite below p² has already been crossed out by a smaller prime factor. When you reach prime p, the multiples 2p, 3p, … up to (p−1)p were each marked by their other factor (2, 3, … p−1). The first multiple of p that no smaller prime has touched is p·p, so starting there skips redundant work and is what makes the total cost O(n log log n).

Q: Why can you stop sieving at the square root of N?

Any composite number c ≤ N has a prime factor no larger than √N. So once you have crossed out all multiples of every prime up to √N, every remaining unmarked number is prime — there is no untouched composite left to remove. For N = 100 million you only loop primes up to 10,000.

Q: What is the time complexity of the Sieve of Eratosthenes?

O(n log log n) time and O(n) space. The work is the sum over primes p ≤ n of n/p, and the sum of reciprocals of primes grows like ln ln n (Mertens' theorem), so the total is n·ln ln n — almost linear. log log n grows so slowly it is below 4 for any n you will ever sieve.

Q: Sieve of Eratosthenes vs trial division — which is faster?

To list all primes up to N the sieve wins overwhelmingly: O(n log log n) versus O(n·√n / ln n) for trial-dividing each number. At N = 1,000,000 the sieve does about 2.9 million operations while trial division does roughly 80 million. Trial division only wins when you need a single isolated primality test on a huge number you would never sieve up to.

Q: How do you sieve primes up to 10 billion without running out of RAM?

Use a segmented sieve. Instead of one boolean array of size N, precompute the primes up to √N (√(10 billion) = 100,000, so about 9,600 of them), then sweep the range in cache-sized blocks of, say, 32 KB, re-using the same small primes to cross out each block. Memory drops from O(N) to O(√N) and cache behaviour improves dramatically.

Q: Is the Sieve of Eratosthenes the fastest way to enumerate primes?

For practical ranges, a wheel-factorized segmented sieve of Eratosthenes is the fastest known approach and what record-breaking prime counters use. The linear sieve crosses each composite exactly once in O(n) but its scattered writes are cache-unfriendly, so it is usually slower in wall-clock time despite the better asymptotic. The Sieve of Atkin is asymptotically O(n / log log n) but its constant factor makes it slower in practice below astronomical N.

Don't test for primes — cross out everything that isn't one

The Sieve of Eratosthenes finds every prime up to N by repeatedly crossing out the multiples of each prime it discovers, running in O(n log log n) time and O(n) space — fast enough to list all 5.7 million primes below 100 million in under a second.

Time complexityO(n log log n)
SpaceO(n) bits
Sieve only up to√N
Start crossing atp²
Discoveredc. 240 BCE

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The trick: don't test, cross out

Almost everyone's first instinct for "find all the primes up to 100" is to check each number one at a time: is 2 prime? Is 3 prime? Is 4 prime? That's trial division, and it does an enormous amount of redundant work — it re-discovers that 4, 6, 8, 10 are all even, that 9, 15, 21 are all divisible by 3, and so on, from scratch for every number.

Eratosthenes of Cyrene — the same Greek polymath who measured the circumference of the Earth — flipped the question around sometime around 240 BCE. Instead of asking "is this number prime?", he asked "which numbers can I rule out?" Write down every integer from 2 to N. The first uncrossed number, 2, must be prime. Now cross out every multiple of 2: 4, 6, 8, … They can't be prime because 2 divides them. The next uncrossed number, 3, must be prime — so cross out 6, 9, 12, … 5 is next: cross out its multiples. Keep going. Every number you never managed to cross out is prime, because nothing smaller divides it.

That's the entire algorithm. There is no division and no primality test anywhere in it — just marking. The genius is that primality falls out as a side effect of repeatedly removing composites, and each composite gets removed by its prime factors rather than discovered by an expensive test.

Two optimizations that make it fast

The naive version above is correct but wasteful, and two observations cut the work down to its famous near-linear bound.

Stop the outer loop at √N. Every composite number c ≤ N has at least one prime factor no larger than √N — if both factors exceeded √N their product would exceed N. So once you have crossed out the multiples of every prime up to √N, every composite has already been hit by its small factor, and every number still standing is prime. For N = 100,000,000 you only iterate the outer loop over primes up to 10,000.

Start crossing out at p², not 2p. When the outer loop reaches a prime p, the multiples 2p, 3p, …, (p−1)p have already been crossed out — each was removed earlier by its smaller factor (2 removed 2p, 3 removed 3p, and so on). The first multiple of p that no smaller prime has touched is p·p. Starting the inner loop at p² and stepping by p avoids all the redundant re-marking. This is the single change that turns an O(n log n) algorithm into O(n log log n).

Why O(n log log n)? The work is the number of cross-out operations, which is the sum over primes p ≤ n of n/p (one mark per multiple). By Mertens' theorem the sum of reciprocals of primes up to n grows like ln ln n, so the total is n · ln ln n. The log log n factor is essentially a small constant in disguise: it stays below 4 for any n you could fit in memory, so in practice the sieve behaves like a linear scan.

When to reach for a sieve

You need all primes in a range, not a one-off primality check. Generating a prime table for cryptographic small-factor trials, number-theory contests, or precomputation.
N is small enough to fit a bit array. A 100-million-bit array is 12.5 MB — trivial. A 10-billion-element array (1.25 GB as bits) is the point where you switch to a segmented sieve.
You want cheap by-products. A tiny modification turns the sieve into a table of smallest prime factors, letting you factorize any number ≤ N in O(log N) afterwards — invaluable for batch factorization.
You're counting or summing primes up to a bound and don't need the gaps between record-sized primes.

Don't use a sieve to test whether one 200-digit number is prime — you can't allocate an array of that size, and a probabilistic test like Miller–Rabin answers in microseconds. The sieve is for density: many primes at once, in a bounded range.

Sieve of Eratosthenes vs the alternatives

	Eratosthenes	Trial division (each n)	Linear sieve	Sieve of Atkin	Segmented sieve
Time to list primes ≤ n	O(n log log n)	O(n √n / ln n)	O(n)	O(n / log log n)	O(n log log n)
Space	O(n) bits	O(1)	O(n) + prime list	O(n) bits	O(√n)
Cache behaviour	Sequential, friendly	n/a	Scattered writes, poor	Modular wheels, mixed	Block-local, excellent
Crosses each composite	Several times	n/a	Exactly once	Via quadratic forms	Several times per block
Gives smallest prime factor	With tweak	No	Yes, natively	No	No
Practical winner up to ~10⁹	Yes (wheel-optimized)	No	Often slower (cache)	No	Yes for huge n
Implementation difficulty	Trivial	Trivial	Moderate	Hard	Moderate

The headline surprise: the linear sieve is asymptotically better (O(n) versus O(n log log n)) yet usually loses in wall-clock time. Because it touches each composite via its smallest prime factor, its memory writes jump all over the array and thrash the cache, while the plain Eratosthenes sieve marks long arithmetic runs sequentially. Asymptotics aren't everything when a single cache miss costs ~100 ns.

What the numbers actually say

≈ 2.9 million operations to sieve to 1,000,000, since 1e6 · ln ln 1e6 ≈ 1e6 · 2.63 ≈ 2.6e6 cross-outs plus the scan. Trial-dividing each number to √n instead costs roughly 80 million divisions — about 27× more work.
5,761,455 primes below 100,000,000, and a tight C sieve finds them all in well under a second on a modern laptop. A naive trial-division loop would take minutes.
12.5 MB for a 100-million bit array. Storing one byte per number instead of one bit costs 100 MB — 8× more memory and 8× more cache misses, which is why serious implementations pack bits.
log log n stays tiny. ln ln(10⁹) ≈ 3.03 and ln ln(10¹⁸) ≈ 3.7. The "log log" factor never realistically exceeds 4, which is why the sieve feels linear.
Skipping even numbers (a 2-wheel) halves the array and removes half the work for free; a 2·3·5 wheel skips about 73% of candidates (keeping just 8 of every 30) before you write a single line of inner-loop code.

JavaScript implementation

function sieve(n) {
  // isComposite[i] === true  ⇒  i is NOT prime
  const isComposite = new Uint8Array(n + 1);
  const primes = [];

  for (let p = 2; p <= n; p++) {
    if (isComposite[p]) continue;   // already crossed out → skip
    primes.push(p);                 // first survivor at p is prime

    // Start at p*p; everything below was marked by a smaller prime.
    // Guard against overflow / staying within the array.
    if (p <= Math.floor(Math.sqrt(n))) {
      for (let m = p * p; m <= n; m += p) {
        isComposite[m] = 1;
      }
    }
  }
  return primes;
}

// sieve(30) → [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

Two details matter. First, Uint8Array beats a plain JS array of booleans by a wide margin — it is contiguous, typed, and zero-initialized, so the engine can keep it in cache. Second, the p <= √n guard around the inner loop isn't just an optimization: for large p, p * p can exceed the safe integer range, so you want to skip it entirely once p passes √n.

Bonus: factorize anything ≤ N in O(log N)

The famous follow-up problem isn't "list the primes" — it's "factorize a thousand numbers quickly." Modify the sieve to record, for each composite, the smallest prime that crossed it out. Then any number can be factorized by repeatedly dividing by its stored smallest prime factor (SPF):

function sieveSPF(n) {
  const spf = new Int32Array(n + 1);   // spf[i] = smallest prime factor of i
  for (let i = 2; i <= n; i++) {
    if (spf[i] === 0) {                // i is prime
      for (let m = i; m <= n; m += i) {
        if (spf[m] === 0) spf[m] = i;  // only the FIRST (smallest) prime claims it
      }
    }
  }
  return spf;
}

function factorize(x, spf) {
  const factors = [];
  while (x > 1) {
    const p = spf[x];
    factors.push(p);
    x = (x / p) | 0;                   // peel off one copy of the smallest factor
  }
  return factors;                      // e.g. factorize(360, spf) → [2,2,2,3,3,5]
}

After one O(n log log n) precomputation, each factorization costs O(log x) divisions — there are at most log₂ x prime factors counted with multiplicity. This is the workhorse behind contest solutions that factorize millions of queries.

Python implementation

def sieve(n):
    """Return all primes <= n via the Sieve of Eratosthenes."""
    is_composite = bytearray(n + 1)        # 0 = prime-so-far, 1 = composite
    primes = []
    for p in range(2, n + 1):
        if is_composite[p]:
            continue
        primes.append(p)
        if p * p <= n:
            # Slice assignment crosses out p*p, p*p+p, ... in C speed.
            step = p
            start = p * p
            is_composite[start : n + 1 : step] = b"\x01" * (((n - start) // step) + 1)
    return primes


def segmented_sieve(lo, hi):
    """Primes in [lo, hi] using only O(sqrt(hi)) memory."""
    import math
    limit = int(math.isqrt(hi))
    base = sieve(limit)                    # primes up to sqrt(hi)
    seg = bytearray(hi - lo + 1)           # seg[i] marks lo+i as composite
    for p in base:
        # First multiple of p that is >= lo and >= p*p.
        start = max(p * p, ((lo + p - 1) // p) * p)
        for m in range(start, hi + 1, p):
            seg[m - lo] = 1
    return [lo + i for i in range(hi - lo + 1)
            if not seg[i] and (lo + i) >= 2]


# sieve(30)              -> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
# segmented_sieve(10**9, 10**9 + 50)  -> primes in that high window

The Python slice-assignment trick — is_composite[start:n+1:step] = b"\x01" * count — pushes the inner loop into the C layer, making pure-Python sieves surprisingly competitive. The segmented version is how you'd find primes near 10⁹ without allocating a billion-element array.

Variants worth knowing

Segmented sieve. Precompute the primes up to √N, then sweep [2, N] in cache-sized blocks (often 32 KB to fit L1), re-using the small primes to mark each block. Memory falls from O(N) to O(√N) and the sequential, cache-resident writes make it the fastest practical approach for very large N.

Wheel factorization. Pre-exclude multiples of the first few primes by only storing residues coprime to, say, 2·3·5 = 30. A 30-wheel keeps just 8 of every 30 numbers, cutting both memory and inner-loop iterations by a constant factor of ~3.75 before you do any real work.

Linear (Euler) sieve. Crosses out each composite exactly once by marking i · prime only for primes ≤ the smallest prime factor of i, achieving true O(n) and yielding the smallest-prime-factor table as a by-product. Beautiful in theory; its scattered writes often make it slower than the plain sieve in practice.

Sieve of Atkin. A 2003 algorithm by Atkin and Bernstein that marks numbers using solutions to quadratic forms (like 4x² + y² = n) instead of multiples. Asymptotically O(n / log log n), but its large constant factor and complexity mean it only overtakes a well-tuned Eratosthenes sieve at astronomically large N.

Bitset sieve. Pack the flags into machine words and operate on bits directly. Combined with skipping evens, this gets you to 1 bit per odd number — about 6 MB for primes up to 100 million — and lets the CPU touch fewer cache lines.

Common bugs and edge cases

Starting the inner loop at 2p instead of p². The code is still correct, just slower — it re-crosses numbers already marked, dragging the complexity toward O(n log n). It's a missed optimization, not a wrong answer.
Off-by-one on the upper bound. If you want primes up to and including N, the array must be size N + 1 and loops must use <= n. Sieving to N−1 silently drops N when N itself is prime.
Integer overflow at p². In fixed-width languages, p * p can wrap around for large p. Guard with p <= √n before computing p * p, or use a 64-bit accumulator.
Treating 0 and 1 as prime. Start the outer loop at 2 and never report 0 or 1. By definition 1 is not prime (it has only one divisor), and the sieve must not leave it "uncrossed and counted."
One byte per flag when memory is tight. A boolean array uses 8× the memory of a bitset and 8× the cache pressure. For N in the hundreds of millions, that's the difference between fitting in cache and constant misses.
Segmented sieve's start offset. The first multiple of p inside a block [lo, hi] is max(p², ⌈lo/p⌉·p), not lo. Forgetting the ceiling alignment marks the wrong cells and reports composites as prime.

Frequently asked questions

Why does the Sieve of Eratosthenes start crossing out at p² instead of 2p?