Concurrency Patterns
Backpressure
When the consumer can't keep up, tell the producer to slow down
Backpressure is the flow-control signal slow consumers send upstream to slow producers down. Bounded queue + block/drop/spill modes. Used in Reactive Streams, gRPC, TCP, Kafka.
- Without itOOM in seconds to hours
- Block strategy costProducer stalls (latency↑)
- Drop strategy costData loss (lossy systems only)
- Spill strategy costDisk latency + ops complexity
- Buffer sizing ruleLittle's Law: throughput × latency
- Famous implementationsTCP sliding window, Reactive Streams, gRPC
Interactive visualization
Three modes side by side. Watch the producer slow, drop, or spill when the consumer can't keep up.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How backpressure works
The setup: a fast producer pushes events into a buffer; a slow consumer drains them. If left unchecked, the buffer grows linearly while the producer outruns the consumer. Memory fills, garbage collection thrashes, latency stretches from milliseconds to minutes, and eventually the process dies with an OutOfMemoryError. This failure mode has destroyed more production systems than almost any other concurrency bug.
Backpressure prevents it by making the buffer's capacity a hard limit and making the producer aware of that limit. The simplest mechanism: a bounded queue with a blocking put(). When the queue is full, the producer's put() waits until the consumer has consumed something. The producer naturally slows to the consumer's rate — feedback control with no explicit messaging.
More sophisticated systems use explicit demand signals. In Reactive Streams (the JVM standard, also adopted by RxJava and Project Reactor), the consumer calls subscription.request(n) to tell the producer "I am ready for n more items." The producer is forbidden from sending more than the requested count. No buffer ever overflows because the producer simply doesn't emit until requested.
TCP famously implements backpressure at the transport layer with its sliding window. Every ACK from the receiver advertises a receive-window size — the maximum bytes the sender may have in flight. As the receiver's application falls behind, its OS receive buffer fills, the window shrinks, and at zero-window the sender pauses. This is what saves you when a slow client meets a fast server: the network itself throttles the writer.
Three strategies for overflow
When the buffer hits its capacity limit, you have three honest choices. Most production systems pick one per stream and document the choice loudly.
Block (the simplest)
The producer's send call blocks until space is available. This is the natural backpressure model — a thread pool's ArrayBlockingQueue.put(), TCP's blocked write, Java's OutputStream.write. Easy to implement, easy to reason about. The cost is that the producer is now coupled to the consumer's speed — if you have a real-time camera feed and the storage layer stalls, the camera stalls too.
Drop (sometimes the only sane choice)
When new data invalidates old data, dropping is correct. Two flavors: drop newest (refuse new submissions when full) or drop oldest (evict the head of the queue to make room). For live video, drop-oldest is right — a frame from two seconds ago is useless. For "at-most-once" telemetry, drop-newest is right because you don't want to corrupt your historical data with sudden spikes of more-recent values.
Some systems offer sampled drop — keep every Nth message, drop the rest. Useful for adaptive sampling under load.
Spill to disk (bounded memory, unbounded queue)
The buffer is backed by a file (or by Kafka's commit log). When in-memory capacity is exceeded, new messages go to disk. Memory is bounded; total queue capacity is bounded only by available disk. Trade-offs: disk reads are 100-1000× slower than memory, you now have a storage layer to operate (replication, retention, compaction), and ordering across the in-memory and on-disk segments must be preserved. Kafka, Pulsar, and RabbitMQ's persistent queues all use this model.
When to use which strategy
- Block when every message matters and producers can tolerate latency — financial transactions, ordered event sourcing, anywhere "drop a message" is a P0 bug.
- Drop-newest when bounded latency matters more than completeness — server access logs, metrics samples, anywhere a steady-state spike is preferable to a queue blowing out memory.
- Drop-oldest when only the freshest data has value — live video, real-time dashboards, sensor readings.
- Spill when you cannot drop and cannot afford to slow producers — durable message brokers, event sourcing systems, replay-required pipelines.
One subtle rule: never silently drop. Production failures always trace back to a system that dropped messages without telling anyone. At minimum, emit a metric on every drop. Better: log the drop rate and trigger an alert above a threshold.
Strategy comparison
| Block | Drop newest | Drop oldest | Spill to disk | |
|---|---|---|---|---|
| Memory bound | Yes | Yes | Yes | Yes |
| Data loss | None | New data lost | Old data lost | None (within retention) |
| Producer latency | Stalls until space | None | None | None (writes are async) |
| End-to-end latency | Unbounded (waits) | Bounded | Bounded | Disk-bounded |
| Implementation cost | Trivial | Easy | Easy | Operational burden |
| Typical use | Financial events, must-be-ordered | Telemetry, logs | Live video, sensors | Durable brokers (Kafka) |
Reactive Streams and demand-based backpressure
The 2015 Reactive Streams specification (and the resulting java.util.concurrent.Flow in Java 9) standardized pull-based backpressure for the JVM. The model is four interfaces — Publisher, Subscriber, Subscription, Processor — with one rule: the Publisher must never emit more than the Subscriber has requested via Subscription.request(n).
subscriber.onSubscribe(subscription)
subscription.request(10) // "send me up to 10"
publisher emits items 1..10
subscriber.onNext(1) ... onNext(10)
subscription.request(5) // "send me 5 more"
publisher emits items 11..15
...
This eliminates buffer overflow entirely — the producer cannot emit what the consumer has not asked for. The cost is a tiny protocol overhead per message and the conceptual difficulty of "demand accounting" everywhere in your pipeline. RxJava, Project Reactor, Akka Streams, and Mutiny all implement this contract.
Pseudo-code: bounded blocking queue
class BoundedQueue:
def __init__(self, capacity):
self.q = []
self.cap = capacity
self.not_full = Condition()
self.not_empty = Condition()
def put(self, item): # producer side
with self.not_full:
while len(self.q) >= self.cap:
self.not_full.wait() # backpressure!
self.q.append(item)
self.not_empty.notify()
def take(self): # consumer side
with self.not_empty:
while not self.q:
self.not_empty.wait()
item = self.q.pop(0)
self.not_full.notify()
return item
# Drop-newest variant:
def put_drop_newest(self, item):
with self.lock:
if len(self.q) >= self.cap:
return False # drop, optionally increment metric
self.q.append(item)
return True
# Drop-oldest variant:
def put_drop_oldest(self, item):
with self.lock:
if len(self.q) >= self.cap:
self.q.pop(0)
self.q.append(item)
Java implementation (Reactor)
import reactor.core.publisher.Flux;
import reactor.core.publisher.FluxSink;
// Producer emits faster than consumer can handle.
Flux<Integer> fastSource = Flux.create(sink -> {
for (int i = 0; i < 1_000_000; i++) sink.next(i);
sink.complete();
}, FluxSink.OverflowStrategy.BUFFER); // BUFFER, DROP, LATEST, ERROR, IGNORE
// Slow consumer — 100 ms per item.
fastSource
.publishOn(Schedulers.parallel(), 64) // bounded internal buffer
.onBackpressureBuffer(1000, item -> metric.increment("dropped"))
.subscribe(item -> {
Thread.sleep(100); // slow consumer
System.out.println(item);
});
// Demand-driven — explicit request().
fastSource.subscribe(new BaseSubscriber<Integer>() {
protected void hookOnSubscribe(Subscription s) {
s.request(10); // initial demand
}
protected void hookOnNext(Integer i) {
System.out.println(i);
// ask for one more after each item processed.
request(1);
}
});
Node.js streams
const { pipeline } = require('stream/promises');
const fs = require('fs');
// Stream-based pipelines apply backpressure automatically.
// When the writable's internal buffer fills, .write() returns false;
// the readable's .pause() is called automatically. When the writable
// drains, the readable is resumed.
await pipeline(
fs.createReadStream('huge.bin'),
someTransform,
fs.createWriteStream('out.bin'),
);
// Manual handling:
const readable = fs.createReadStream('huge.bin');
const writable = fs.createWriteStream('out.bin');
readable.on('data', chunk => {
if (!writable.write(chunk)) {
readable.pause();
writable.once('drain', () => readable.resume());
}
});
Common pitfalls
- Unbounded queues anywhere. An unbounded
LinkedBlockingQueue, an in-memory list as a buffer, a Kafka topic with no retention — anywhere queues can grow without bound is a backpressure failure waiting to happen. - Silent drops. A drop-policy queue without a counter looks fine until customers complain about missing data. Every drop should increment a metric; sustained drop rates should alert.
- Backpressure that doesn't propagate. If service A applies backpressure to service B, but B doesn't propagate it to its upstream C, then C still saturates B. Backpressure has to flow all the way back to the ultimate source of demand (the user, the ingest API, the message broker).
- Buffers sized for steady state, not bursts. Little's Law gives you average queue depth; bursts of 10× the average rate need 10× the buffer or a drop policy. Either oversize the buffer or accept bounded drops.
- Blocking inside async code. Calling a blocking
put()from an async coroutine stalls the event loop. Use the async-friendly variant (Python'sasyncio.Queue.put(), Tokio'smpsc::Sender::send().await). - Mixing demand-based and rate-based. Reactive Streams expects strict demand; an upstream that emits regardless of demand violates the contract and will eventually crash.
Performance and sizing
Little's Law gives the canonical sizing rule: average queue length = throughput × average latency. If your consumer processes 1000 messages per second and average end-to-end latency is 50 ms, average queue depth is 50 messages. Make the buffer 2-4× that — say 128-256 messages — to absorb bursts without stalling.
For a typical bounded ArrayBlockingQueue on the JVM, each put()/take() costs around 200-500 nanoseconds of mutex overhead. Beyond ~5 million ops/second, the lock becomes a bottleneck and you should consider lock-free MPMC queues (LMAX Disruptor, JCTools' MpmcArrayQueue) which can sustain 100-500 million ops/second.
The disk-spill model adds 50-200 microseconds per write to durable storage (NVMe with fsync). That overhead is invisible against the 100 ms it might take to process a downstream call — but it forces you to operate the storage layer, including retention, replication, and corruption recovery.
Frequently asked questions
What happens if you don't implement backpressure?
An unbounded queue between a fast producer and a slow consumer grows linearly until memory is exhausted, then the process dies with OutOfMemoryError or OOM-killer. Before death, the queue stretches latency end-to-end — a message that took 1 ms to produce takes minutes to consume. Backpressure is what stops the queue from growing forever and converts a latency disaster into either throttling, controlled dropping, or disk-based buffering.
What are the three primary backpressure strategies?
Block: when the buffer fills, the producer's send call blocks until space is available — applies natural backpressure but stalls real-time work. Drop: discard either newest or oldest messages — bounded latency at the cost of data loss; correct for telemetry, wrong for financial events. Spill to disk: write overflow to a file-backed queue (Kafka's commit log) — bounds memory but adds disk latency and operational complexity.
How does TCP implement backpressure?
TCP uses a sliding window. The receiver advertises a receive-window size in every ACK — the maximum bytes the sender may have in flight. If the receiver's application is slow to read, the OS receive buffer fills, the advertised window shrinks toward zero, and the sender stops sending. This is end-to-end backpressure built into the protocol; the application never has to think about it.
What is the difference between push-based and pull-based flow control?
Push: producer emits as fast as it can; the channel must buffer or drop. Pull: consumer explicitly requests N items; producer can only emit when requested. Reactive Streams uses pull-based — Subscriber.request(n) lets the consumer dictate demand. gRPC streams use it. Kafka consumers do too. Pull-based is more complex but eliminates buffer overflow entirely.
When should you drop versus block?
Drop when newer data invalidates older data — live video frames, real-time sensor readings, telemetry samples. A consumer that's two seconds behind doesn't help anyone. Block when every message matters — financial transactions, inventory updates, ordered event streams. Blocking forces the producer to slow down, which usually means propagating backpressure further upstream (a request returns 429 Too Many Requests).
How big should the bounded buffer be?
Little's Law: average queue depth = throughput × latency. If your consumer processes 1000 messages per second and average end-to-end latency is 50 ms, you need a buffer of about 50 messages to keep both sides busy. Larger buffers smooth bursts but add latency and memory; smaller buffers throttle producers earlier but limit burst tolerance. Most production systems start at 32-1024 and tune from there.
What is the relationship between backpressure and rate limiting?
Rate limiting is external — a producer is told 'no more than 100 requests per second' regardless of consumer state. Backpressure is dynamic — the consumer tells the producer 'I can take more now' or 'I'm full, slow down.' Rate limiting protects downstream from overload preemptively; backpressure reacts to observed downstream pressure. Production systems often use both — a rate limiter for ingress plus backpressure within the system.