Concurrency Patterns

Thread Pool

Fixed workers, shared queue, no per-task spawn cost

A thread pool keeps a fixed set of worker threads alive and feeds them tasks from a shared queue. Avoids ~1 ms per-thread creation cost. Pool size = cores+1 for CPU-bound, much higher for I/O.

  • Thread create cost (Linux)~1 ms + 1-8 MB stack
  • CPU-bound pool sizecores + 1
  • I/O-bound pool sizecores × (1 + wait/compute)
  • Task dispatch overhead~1 µs (queue + wakeup)
  • Common queueBounded ArrayBlockingQueue
  • Famous implementationsThreadPoolExecutor, libuv, ForkJoinPool

Interactive visualization

Watch tasks flow from the incoming stream into available workers. When workers saturate, the queue starts to fill.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How a thread pool works

A thread pool is two things glued together: a fixed-size set of worker threads and a thread-safe queue. At startup, the pool spawns its workers. Each worker runs an identical loop — pull a task from the queue, run it, repeat. When the queue is empty, the worker blocks on the queue's take() operation until something arrives. When you submit a task, you push it onto the queue and one waiting worker wakes up to handle it.

That tiny piece of machinery solves two problems. First, it amortizes the cost of creating threads. On Linux, clone() takes around 1 ms and reserves 1-8 MB of virtual memory for the new thread's stack — a web server handling 10,000 requests per second can't afford to spend 10 full CPU-seconds per wall-second on thread setup. Second, the pool bounds concurrency. If 100 requests arrive simultaneously but the pool has 8 workers, only 8 will run at once. The other 92 wait. That's a feature, not a bug — without backpressure, a flood of concurrent work can drive the system into thrashing.

The pool's worker loop is comically simple:

while running:
    task = queue.take()    # blocks if empty
    try:
        task.run()
    except Exception as e:
        log(e)

Everything else — sizing, rejection policy, scheduling, monitoring — is a parameter on top of this loop.

Sizing the pool

The single most-asked thread pool question: how many workers? The answer depends entirely on whether your tasks burn CPU or wait on I/O.

CPU-bound work. Tasks that compute non-stop (image processing, hashing, parsing). Brian Goetz's rule from Java Concurrency in Practice: pool size = number of cores + 1. The +1 covers occasional page faults or cache misses where a worker briefly blocks. Going higher just adds context-switch overhead — you can't run more CPU work in parallel than you have CPUs.

I/O-bound work. Tasks that mostly wait on the network or disk. The classic formula:

pool_size = cores × target_utilization × (1 + wait_time / compute_time)

If a task spends 90 ms waiting on a database round-trip and 10 ms processing, the ratio is 9. With 8 cores at 80% target utilization, that's 8 × 0.8 × 10 = 64 workers. For very long-tailed I/O — calls to slow external services — you might land at 200-500 workers, with most of them blocked at any moment. Counter-intuitive, but correct.

Measure, don't extrapolate. The formulas are starting points. Watch P99 task latency and queue depth under realistic load. Increase the pool while latency drops; stop when it stops improving. Going further only adds overhead.

Variants and queue choices

The pool's behavior is dominated by its queue type and rejection policy. Java's ThreadPoolExecutor exposes four primary knobs: corePoolSize, maximumPoolSize, the work queue, and the rejection handler.

  • Fixed pool with bounded queue. The safe production default. corePoolSize = maximumPoolSize. ArrayBlockingQueue with a finite capacity (often 100-1000). When full, the rejection policy fires. Predictable memory, predictable behavior.
  • Fixed pool with unbounded queue. LinkedBlockingQueue with no capacity. Submission never rejects — but if consumers can't keep up, the queue grows until you run out of memory. Anti-pattern unless you know throughput is bounded upstream.
  • Cached pool. corePoolSize = 0, maximumPoolSize = Integer.MAX_VALUE, SynchronousQueue. Every submission either reuses an idle worker or spawns a new one. Excellent for short bursty work; catastrophic under sustained load because it can spawn tens of thousands of threads.
  • Work-stealing pool. Each worker owns a deque; idle workers steal from busy workers' deques. Java's ForkJoinPool, Go's scheduler, Tokio's runtime. Best when tasks fan out into sub-tasks; avoids contention on a single shared queue.

Rejection policies

When the queue is full and all threads are busy, the pool must decide what to do with new submissions. The choice has serious consequences.

PolicyBehaviorUse when
AbortPolicyThrows RejectedExecutionExceptionDefault. Caller decides how to recover.
CallerRunsPolicyCalling thread runs the task itselfNatural backpressure — the submitter is throttled.
DiscardPolicySilently drops the new taskLossy metrics, telemetry. Rarely correct.
DiscardOldestPolicyDrops the oldest queued task, queues new oneNewer data matters more than older (live feeds).
CustomYou implement RejectedExecutionHandlerSpill to disk, redirect to slower pool, etc.

CallerRunsPolicy is underrated. By making the submitter run rejected tasks, it forces backpressure to propagate upstream — exactly what you want when downstream is overwhelmed.

When to use a thread pool

  • Web servers and RPC servers. One worker per request would create a thread per request; a pool handles thousands of requests on dozens of threads.
  • Parallel data processing. Map-style work over a collection where each item processes independently. Match pool size to cores.
  • Asynchronous executors. Java's CompletableFuture, .NET's TaskScheduler, Python's ThreadPoolExecutor — all use pools under the hood.
  • Background work in mobile/desktop apps. Keep the UI thread free; offload disk and network work to a small pool.

Avoid thread pools when tasks block indefinitely (use coroutines or async/await), when tasks have wildly different priorities (use separate pools), or when latency requirements are sub-microsecond (the queue overhead alone is ~1 µs).

Thread pool vs alternatives

Thread poolNew thread per taskSingle-thread event loopGoroutines / fibers
Setup cost per task~1 µs queue + wakeup~1 ms + stack alloc~100 ns callback~2 µs spawn
Max concurrencyBounded by pool sizeUntil OOM1 at a time~millions
True parallelismYes, up to pool sizeYesNoYes, across M:N scheduler
Blocking I/O OKYesYesNoYes
Memory per worker1-8 MB stack1-8 MB stackNone~2 KB initial
Best forCPU-bound + blocking I/OAlmost neverPure async I/OMassive concurrency

Pseudo-code

class ThreadPool:
    def __init__(self, size, queue_capacity):
        self.queue = BoundedBlockingQueue(queue_capacity)
        self.workers = [Thread(target=self._worker_loop) for _ in range(size)]
        self.running = True
        for w in self.workers: w.start()

    def submit(self, task):
        if not self.queue.offer(task):  # bounded — may fail
            raise RejectedExecutionException()

    def _worker_loop(self):
        while self.running:
            task = self.queue.take()
            try: task()
            except Exception as e: log(e)

    def shutdown(self):
        self.running = False
        for _ in self.workers: self.queue.put(POISON_PILL)
        for w in self.workers: w.join()

Java implementation

// Production-safe configuration.
int cores = Runtime.getRuntime().availableProcessors();
ThreadPoolExecutor pool = new ThreadPoolExecutor(
    cores,                                    // corePoolSize
    cores,                                    // maximumPoolSize (fixed)
    0L, TimeUnit.MILLISECONDS,                // keep-alive (unused for fixed)
    new ArrayBlockingQueue<>(1000),           // bounded queue
    new ThreadFactoryBuilder()
        .setNameFormat("worker-%d")
        .setDaemon(false)
        .build(),
    new ThreadPoolExecutor.CallerRunsPolicy() // backpressure
);

// Submit a task and get a future for its result.
Future<Integer> result = pool.submit(() -> expensiveCompute());
Integer answer = result.get(5, TimeUnit.SECONDS);

// Graceful shutdown.
pool.shutdown();
if (!pool.awaitTermination(30, TimeUnit.SECONDS)) {
    pool.shutdownNow();
}

Python implementation

from concurrent.futures import ThreadPoolExecutor
import os

# I/O-bound: many workers fine because most are waiting.
pool = ThreadPoolExecutor(max_workers=64, thread_name_prefix='io-worker')

# Submit returns a Future immediately.
future = pool.submit(fetch_url, "https://example.com")
result = future.result(timeout=10)

# Map for batch parallelism.
urls = [...]  # 10,000 URLs
with ThreadPoolExecutor(max_workers=200) as p:
    for body in p.map(fetch_url, urls, timeout=30):
        process(body)
# Pool is shut down when the with-block exits.

# CPU-bound: GIL forces you to ProcessPoolExecutor instead.
# A Python thread pool gives you concurrency, not parallelism, for pure-Python compute.

Python's GIL is the catch. For CPU-bound work in pure Python, threads share a single interpreter lock and only one runs Python bytecode at a time. Use ProcessPoolExecutor instead, or release the GIL by calling into C extensions (NumPy, Pillow, native cryptography).

Common pitfalls

  • Unbounded queues. The most common production outage: an unbounded LinkedBlockingQueue swallows millions of tasks during a downstream slowdown, then OOMs the JVM. Always cap the queue.
  • Submitting to your own pool and blocking on the future. If the pool is N workers deep and you submit M tasks, each waiting on tasks they submitted, deadlock at M = N. Use separate pools for nested work or use ForkJoinPool.
  • Forgetting daemon flag. Non-daemon worker threads keep the JVM alive even after main exits. Use a daemon thread factory unless you specifically need workers to outlive main.
  • Swallowing exceptions. A worker that doesn't catch exceptions silently dies. Catch in the worker loop and log, or use UncaughtExceptionHandler.
  • Sharing mutable state without locking. Two pool tasks running concurrently on the same data race. Use immutable values, locks, or thread-safe collections.
  • No bounded timeout on shutdown. shutdown() stops accepting tasks but lets queued ones finish. If a task hangs forever, your process won't exit. Always pair with awaitTermination() and a fallback shutdownNow().

Performance analysis

For a typical Java ThreadPoolExecutor on modern Linux, dispatch latency — the time from submit() to the task starting on a worker — is around 1-3 µs when the pool is warm and the queue is shallow. That breaks down roughly as: queue enqueue (~200 ns lock + put), signal a waiting worker (~500 ns futex wake), context switch to the worker (~1 µs), then the worker dequeues (~200 ns). Compared to the 1 ms cost of creating a fresh thread, the pool wins by roughly 1000×.

Throughput scales linearly with pool size up to the contention limit on the queue. For a single ArrayBlockingQueue, that limit is typically around 10 million tasks/second on a modern server. Past that, the queue's mutex becomes the bottleneck — switch to a work-stealing pool (per-worker deques) to scale further.

Memory cost: each idle worker consumes 1-2 MB of resident memory (stack + thread-local storage). A pool of 200 workers costs around 400 MB before any actual work. This is the real reason cached pools are dangerous — they will happily allocate gigabytes if a load spike hits.

Frequently asked questions

Why use a thread pool instead of creating threads on demand?

Creating a thread costs roughly 1 ms on Linux and allocates 1-8 MB of stack space, plus a syscall into the kernel. A web server handling 10,000 requests per second can't afford 10,000 thread-creates per second — it would spend most of its CPU on thread bookkeeping. A pool pays the cost once at startup, then reuses workers for thousands of tasks each.

How do I pick the pool size?

For CPU-bound work, use cores + 1 — Brian Goetz's classic rule from Java Concurrency in Practice. The +1 hides occasional page faults or cache misses. For I/O-bound work, the formula is cores × (1 + wait_time / compute_time), which often produces hundreds of threads for high-latency network work. Measure under realistic load; do not guess.

What happens when the queue fills up?

That depends on the rejection policy. Java's ThreadPoolExecutor lets you choose: AbortPolicy throws RejectedExecutionException, CallerRunsPolicy makes the submitting thread run the task itself (applying back-pressure), DiscardPolicy silently drops, and DiscardOldestPolicy evicts the oldest queued task. Picking the wrong policy is a leading cause of mysterious production failures.

Should I use a fixed or a cached thread pool?

Almost always fixed. Cached pools (newCachedThreadPool in Java) have an unbounded SynchronousQueue and create new threads on demand up to Integer.MAX_VALUE — under load they can spawn tens of thousands of threads and crash the JVM with OutOfMemoryError. Use a fixed-size pool with a bounded queue and a sensible rejection policy.

What's the difference between a thread pool and an event loop?

A thread pool runs tasks in parallel on multiple OS threads — true concurrency, suitable for CPU work. An event loop runs callbacks one at a time on a single thread, switching between them at I/O suspension points — suitable for I/O work where you'd otherwise idle. Most modern runtimes combine both: a single-threaded event loop for I/O multiplexing and a thread pool for CPU-heavy work (Node's libuv, Tokio's blocking pool).

How do work-stealing pools differ from regular pools?

A standard pool has one shared queue — every worker pulls from it, which becomes a contention bottleneck at high task rates. A work-stealing pool gives each worker its own local deque; idle workers steal from the tail of busy workers' deques. Java's ForkJoinPool and Go's scheduler use this. Best when tasks fan out (one task spawns many sub-tasks).

What is thread starvation and how does it relate to pool sizing?

Starvation happens when all pool threads are blocked waiting for results from other tasks that need a thread to run on. Classic deadlock: a task submits a sub-task and awaits its result, but the pool is full and the sub-task cannot get scheduled. Avoid by never blocking on tasks submitted to the same pool, or by sizing the pool larger than the deepest dependency chain plus a margin.