Resilience Patterns
Bulkhead Pattern
Partition resources per dependency — one compartment floods, the others stay dry
A bulkhead gives each dependency its own bounded resource pool. A slow downstream fills only its own pool. Hystrix default: 10 threads/dependency. From ship bulkheads.
- Named aftership watertight compartments
- Hystrix default10 threads · queue 5
- Two flavorsthread-pool · semaphore
- Fail-fast latencymicroseconds (vs. 30 s hang)
- Pair withcircuit breaker + timeout + retry
- Famous implementationsHystrix, Resilience4j, Polly, Envoy
Interactive visualization
Without bulkheads, one slow dependency exhausts all threads. With bulkheads, only that compartment fills — everything else keeps running.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How the bulkhead pattern works
The Titanic had bulkheads. They just weren't tall enough. The hull was divided into watertight compartments by vertical steel walls — a leak in compartment four should have stayed in compartment four. The fatal mistake was that the bulkheads only rose to E deck, so when the bow sank low enough, water spilled over the tops into the next compartment, and the next, and the next. The lesson held: divide the hull, make the compartments truly watertight, and a single breach is survivable.
The software equivalent applies the same principle to shared resources. The most common implementation is a thread pool per downstream dependency. If your service talks to three downstreams — Payment, Inventory, Recommendation — you don't pull threads from a shared pool of 200. You give each one its own pool: say 50 + 30 + 20. Now when Recommendation slows down and its 20 threads block on 30-second timeouts, the 50 Payment threads and 30 Inventory threads keep flowing. Two-thirds of your service stays alive.
Without bulkheads, a single slow dependency can drag down the entire service. The classic incident: a third-party API that normally responds in 50 ms starts taking 30 seconds. Each call holds a thread for 600x longer than usual. At 100 requests-per-second, you accumulate 3,000 in-flight calls in the first 30 seconds — a service with a 200-thread pool is fully saturated in 2 seconds. Now every call to every other downstream also can't get a thread, and the cascade has spread far beyond the original slow API.
Two implementations: thread-pool and semaphore
Hystrix and Resilience4j both offer two bulkhead flavors:
- Thread-pool bulkhead. Run each dependency's calls on a dedicated thread pool. The caller submits a task to
downstreamA.executorand gets a Future. The submission can be rejected immediately if the pool is full. The task runs on a worker thread that's independent of the caller's thread. - Semaphore bulkhead. Just a counting semaphore that bounds concurrent calls. The caller acquires a permit; if none is available, fail fast. The call still runs on the caller's thread — no context switch.
The right choice depends on the call:
- Synchronous, blocking I/O (JDBC, blocking HTTP, file I/O) → thread-pool bulkhead. The dedicated pool can be sized independently of the request-handling threads, and the timeout can be enforced by interrupting the worker.
- Async or reactive calls (CompletableFuture, RxJava, Project Reactor) → semaphore bulkhead. The call is already non-blocking, so wrapping it on another thread is pure overhead. The semaphore alone is enough to limit concurrency.
Thread-pool overhead is real but small — typically 1-10 microseconds per submission for the queue and thread handoff. Semaphore acquisition is sub-microsecond. For a remote call that costs milliseconds, both are noise; for an in-process call that costs nanoseconds, both are unjustified.
Worked example — three dependencies, one goes slow
A microservice processes orders. Each order calls three downstreams: Inventory (check stock), Payment (charge card), Email (send confirmation). Production sizes: 50 threads to inventory, 30 to payment, 20 to email. Total service threads: 100.
Normal load: 200 orders/sec, each taking 100 ms across the three calls
Inventory pool: in-use ≈ 200 × 0.05 = 10 threads / 50 cap
Payment pool: in-use ≈ 200 × 0.03 = 6 threads / 30 cap
Email pool: in-use ≈ 200 × 0.02 = 4 threads / 20 cap
Service runs healthy at 20% pool utilization overall.
Event: Email provider degrades to 30-second timeouts.
Without bulkhead (single pool of 100 threads):
Email holds 200 × 30 = 6000 in-flight after 30 s — far more than 100 threads
Pool saturates in seconds; new orders queue or fail
Inventory and Payment calls can't get threads either
Service is functionally down even though only Email is broken.
With bulkhead (separated pools 50 / 30 / 20):
Email pool fills to 20 / 20 in ~0.1 s; subsequent email submissions are rejected
Order code catches the rejection, marks email as "send later"
Inventory and Payment pools stay at normal utilization (10/50, 6/30)
Orders still ship and get charged; only the confirmation email is delayed.
The bulkhead doesn't make the email problem go away. It contains it. The cost is real — every order now gets a degraded experience (no immediate email). But the alternative — every order failing — is much worse.
Hystrix / Resilience4j implementation
The Hystrix defaults — 10 threads per dependency, queue of 5 — are the textbook starting point. They came from Netflix's internal experience: 10 is small enough to surface backpressure quickly, large enough that normal traffic flows freely.
import io.github.resilience4j.bulkhead.*;
import io.github.resilience4j.bulkhead.ThreadPoolBulkhead;
import java.time.Duration;
// Thread-pool bulkhead for blocking I/O
ThreadPoolBulkheadConfig poolConfig = ThreadPoolBulkheadConfig.custom()
.maxThreadPoolSize(50)
.coreThreadPoolSize(50)
.queueCapacity(10)
.keepAliveDuration(Duration.ofMillis(20))
.build();
ThreadPoolBulkhead inventoryBulkhead =
ThreadPoolBulkhead.of("inventoryService", poolConfig);
// Semaphore bulkhead for async/reactive calls
BulkheadConfig semConfig = BulkheadConfig.custom()
.maxConcurrentCalls(30)
.maxWaitDuration(Duration.ofMillis(50))
.build();
Bulkhead paymentBulkhead =
Bulkhead.of("paymentService", semConfig);
// Use:
CompletionStage<Stock> stockFuture = inventoryBulkhead.executeSupplier(
() -> inventoryClient.check(itemId));
// Semaphore:
Supplier<Charge> charge = Bulkhead.decorateSupplier(
paymentBulkhead, () -> paymentClient.charge(invoice));
Note the maxWaitDuration on the semaphore — that controls how long the caller waits for a permit before failing. Zero means strict fail-fast; small values (50-100 ms) smooth over brief contention bursts.
Sizing the bulkhead
The size of each pool is the main tuning knob. Too small and you reject calls during normal traffic peaks. Too large and the bulkhead provides no isolation — a slow dependency can still drain the whole CPU. The formula:
pool_size = (peak_RPS × P99_latency_seconds) × headroom
where:
peak_RPS = peak requests/second to this dependency
P99_latency = 99th percentile observed latency
headroom = 1.5 to 2.0 (handles bursts)
Example:
Inventory: 200 RPS, P99 = 80 ms, headroom 1.5
pool_size = 200 × 0.08 × 1.5 = 24 threads
Combine with three safeguards: a queue (typically 5-20 entries) for smoothing brief overruns; a maxWaitDuration of 50-200 ms to limit caller-side waiting; an explicit timeout per call so a stuck call eventually frees its thread. The pool size, queue, and timeout together define the "compartment ceiling" — the maximum impact a sick dependency can have.
Variants and broader applications
- Thread-pool bulkhead. The standard. One pool per dependency. Hystrix and Resilience4j both implement this directly.
- Semaphore bulkhead. No dedicated threads; just a count of concurrent permits. Best for non-blocking calls.
- Connection-pool bulkhead. Separate JDBC / HTTP connection pools per database / API. HikariCP makes this trivial — just instantiate multiple HikariDataSource beans.
- Process / container bulkhead. Kubernetes resource limits per container, separate pods per workload type. CPU and memory budgets enforced by the orchestrator.
- Cell-based architecture. AWS's term for the coarsest bulkhead: full stacks duplicated per cell, customers routed to one cell. A bad deploy or runaway tenant in cell A can't affect cell B. Used to keep blast radius small at scale.
- VM-per-tenant. The original SaaS bulkhead. Each customer gets a dedicated VM. Maximum isolation, lowest density — appropriate only for high-value customers.
Bulkhead vs related resilience patterns
| Pattern | What it does | When to use |
|---|---|---|
| Bulkhead | Pre-allocates resources per dependency | Prevent cross-dependency starvation |
| Circuit Breaker | Stops calls when failure rate is high | Avoid hammering a sick downstream |
| Timeout | Caps per-call wait time | Always — pair with everything |
| Retry + Backoff | Retries with exponential delay | Transient failures |
| Rate Limiting | Caps requests per second | Protect downstream proactively |
| Throttling | Delays excess requests instead of dropping | Smooth bursty traffic |
| Fallback | Returns a default when primary fails | Degraded but useful response |
The defensive stack for a remote call in production: bulkhead → circuit breaker → timeout → retry. Bulkhead defines the maximum resource impact. Circuit breaker stops calling a known-sick downstream. Timeout bounds per-call wait. Retry handles transient blips. Hystrix bundled all four behind a single annotation; Resilience4j keeps them composable.
Python implementation
import threading
from concurrent.futures import ThreadPoolExecutor, TimeoutError
from contextlib import contextmanager
class ThreadPoolBulkhead:
def __init__(self, name, max_workers=10, queue_size=5):
self.name = name
self.executor = ThreadPoolExecutor(max_workers=max_workers)
self.semaphore = threading.BoundedSemaphore(max_workers + queue_size)
def call(self, func, *args, timeout=2.0, **kwargs):
if not self.semaphore.acquire(blocking=False):
raise BulkheadFullError(f"{self.name}: bulkhead saturated")
try:
future = self.executor.submit(func, *args, **kwargs)
return future.result(timeout=timeout)
finally:
self.semaphore.release()
class SemaphoreBulkhead:
def __init__(self, name, max_concurrent=30):
self.name = name
self.semaphore = threading.BoundedSemaphore(max_concurrent)
@contextmanager
def __call__(self):
if not self.semaphore.acquire(blocking=False):
raise BulkheadFullError(f"{self.name}: bulkhead saturated")
try:
yield
finally:
self.semaphore.release()
class BulkheadFullError(Exception): pass
# Usage:
inventory_bh = ThreadPoolBulkhead("inventory", max_workers=50)
payment_bh = SemaphoreBulkhead("payment", max_concurrent=30)
def process_order(order):
stock = inventory_bh.call(check_stock, order.item_id, timeout=0.5)
with payment_bh():
return charge_card(order)
Common pitfalls
- Sharing a single pool across all dependencies. Defeats the entire purpose. If you have N downstreams, you have N bulkheads — not one super-pool.
- Pool too large. A "bulkhead" of 500 threads to a service that needs 20 at peak is no bulkhead — the slow dependency can still drag down the JVM heap and the CPU. Right-size aggressively.
- No timeout per call. The bulkhead bounds concurrency but not duration. If a worker thread blocks forever, the pool slot is held forever. Always combine bulkhead with a per-call timeout.
- Queue too large. A queue of 1000 entries means a stuck dependency can hold thousands of pending requests in memory for minutes. Keep queues small (5-20) so backpressure is visible quickly.
- Not monitoring rejection rate. If a bulkhead is silently rejecting 10% of calls, you have a problem. Emit metrics for active threads, queue depth, rejection count, and alert when any climb.
- Bulkhead without a fallback strategy. Rejecting calls is necessary but not sufficient. The application needs to do something useful with the rejection — fall back to cache, return a degraded response, mark the work for retry — not just propagate an exception to the user.
- Bulkhead but no circuit breaker. When the downstream stays sick, the bulkhead pool stays full. Adding a circuit breaker stops sending new traffic into the dead pool, freeing it to recover and saving the rejection cost.
Performance and impact
The per-call overhead of a thread-pool bulkhead is dominated by the executor submission: typically 1-10 microseconds for a fork-join or ThreadPoolExecutor on a JVM, slightly more on Node or .NET. A semaphore bulkhead is sub-microsecond — a single atomic compare-and-swap. For a remote call that costs 10-100 ms, both overheads are invisible.
The benefit, however, is the difference between a degraded service and a dead one. Netflix's outage post-mortems from the early 2010s repeatedly cited bulkhead isolation as the reason a single failing downstream did not take down the entire site. The Hystrix dashboard — color-coded boxes showing each bulkhead's utilization, rejection rate, and circuit state — was an operational breakthrough; you could see at a glance which dependency was sick.
Today the pattern is so taken for granted that service meshes (Envoy, Istio, Linkerd) implement bulkheads transparently at the proxy layer. Your application doesn't need to know — the proxy enforces per-upstream connection pools and concurrent-request limits, with metrics and admin endpoints to inspect them. The principle hasn't changed in two decades: partition your resources along failure boundaries, and one breach won't sink the ship.
Frequently asked questions
What is the bulkhead pattern in one sentence?
The bulkhead pattern partitions a service's resources — most commonly its thread pool — so that calls to one dependency can never consume threads needed by calls to another dependency. If service B becomes slow, only the thread pool reserved for B is exhausted; calls to services A and C continue using their own pools.
Why is it called a bulkhead?
Ships have vertical interior walls called bulkheads that divide the hull into watertight compartments. A breach in one compartment floods only that compartment; the others stay dry and the ship keeps floating. The software equivalent: isolate the 'flooding' — slow calls, blocked threads, exhausted connections — to one dependency-specific pool so the rest of the system stays operational. Michael Nygard popularized the term in his 2007 book Release It!.
Thread-pool bulkhead vs semaphore bulkhead — which to use?
Thread-pool bulkhead runs the dependent call on a separate thread pool. Pro: timeouts work even for blocking I/O (you can interrupt the worker). Con: thread context-switch cost (~1-10 μs) and the complexity of managing N pools. Semaphore bulkhead just bounds concurrent permits with a counting semaphore — the call runs on the caller's thread. Pro: zero context-switch overhead. Con: cannot interrupt a stuck call. Rule of thumb: thread-pool for synchronous, blocking I/O calls; semaphore for non-blocking async/reactive calls.
How do I size each bulkhead?
Start with Hystrix's defaults: 10 threads per dependency with a queue of 5. The size should be high enough that normal traffic never queues but low enough that a stuck dependency can't drag the whole service down. Practical sizing formula: pool_size = (peak_RPS_per_dependency × P99_latency_seconds) × headroom_factor (1.5-2.0). For 100 RPS at 100 ms latency: 100 × 0.1 × 2 = 20 threads. Always measure under production load and tune; defaults are starting points only.
What happens when a bulkhead is full?
When all threads (or semaphore permits) for a dependency are in use, new calls are rejected immediately — typically by throwing a RejectedExecutionException or the equivalent. The fail-fast behavior is the whole point: better to reject 5% of calls quickly than have 100% of calls hang for 30 seconds. The rejecting code can fall back to a default value, a cached response, or propagate the error to the caller — same options as a circuit breaker in OPEN state.
Bulkhead vs circuit breaker — what's the difference?
Bulkhead is structural — it allocates separate resources upfront so one tenant cannot starve others. Circuit breaker is behavioral — it observes failures and stops calling a sick dependency. They compose: the bulkhead limits how much damage a sick downstream can do (only its pool fills up); the circuit breaker then trips OPEN to stop sending traffic at all. Hystrix bundles both; Resilience4j keeps them composable. Without a bulkhead, a hung call can monopolize threads before the circuit breaker has time to trip.
Does the bulkhead pattern apply outside of thread pools?
Yes — anywhere you have a finite shared resource. Connection pools per database; memory budgets per request type; container CPU quotas per workload; even separate VMs per tenant. AWS uses cell-based architecture as a coarse-grained bulkhead — entire stacks duplicated per cell so a deploy gone wrong in one cell can't affect customers in another. The principle is universal: partition resources along failure boundaries.