Resilience Patterns

Retry with Exponential Backoff

Wait longer between attempts, randomize the wait — survive transient failures without a herd

Retry failed calls with delays 1s, 2s, 4s, 8s plus random jitter. 1+2+4+8 = 15s for 4 retries. Jitter prevents synced storms. AWS SDK, gRPC, HTTP standard.

  • Formuladelay = base × 2^attempt + jitter
  • Standard delays1, 2, 4, 8, 16 s
  • 4-retry total1+2+4+8 = 15 s
  • Cap3-5 retries typically
  • Jitter typefull · equal · decorrelated
  • Used inAWS SDK, gRPC, HTTP libraries

Interactive visualization

Many clients fail at the same instant. Without jitter they all retry together and slam the recovering service. With jitter, retries spread across the window.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How exponential backoff works

Networks fail. A packet drops, a TCP connection resets, a server returns 503 because it just restarted. Sometimes the failure is permanent — a wrong URL, a deleted resource. But often it's transient — wait a moment and the next attempt succeeds. Retrying is the right move. The question is: how long to wait?

The naive answer — retry immediately — is the worst possible answer. If a downstream is overloaded and returning 503s, immediate retries amplify the load and prolong the outage. If the downstream just restarted and is still warming up, immediate retries arrive while it's not ready.

The standard answer — exponential backoff — solves both. After failure n, wait base × 2^n seconds before retrying. Doubling means each successive retry is much later than the last, so the load on the downstream falls off exponentially as failures persist.

attempt | delay before this attempt | cumulative wait
--------|---------------------------|----------------
   0    |  0 (initial call)         |   0 s
   1    |  base × 2^0 = 1 s         |   1 s
   2    |  base × 2^1 = 2 s         |   3 s
   3    |  base × 2^2 = 4 s         |   7 s
   4    |  base × 2^3 = 8 s         |  15 s
   5    |  base × 2^4 = 16 s        |  31 s

The 1+2+4+8 sequence is the canonical four-retry budget: 15 seconds of total elapsed time, after which you give up and propagate the failure. Cap your retries at 3-5 — beyond that, you're spending more time waiting than the operation likely deserves.

The thundering herd problem

Pure exponential backoff has a subtle but devastating flaw at scale. Picture ten thousand clients all calling the same service. The service goes briefly down at t=0. All ten thousand calls fail simultaneously. They all wait exactly 1 second. They all retry exactly at t=1, slamming the recovering service with ten thousand simultaneous requests. The service falls again. They all wait exactly 2 seconds. They all retry exactly at t=3. Slam, fall, repeat.

This is the thundering herd. The service is held in a perpetual half-recovery state because every retry round is synchronized. The longer the outage, the more clients accumulate in synchronized retry buckets.

The fix: add random jitter. Instead of delay = base × 2^n, use delay = random(0, base × 2^n). Each client picks an independent random delay in the range. The retries spread out across the window. At t=1 only ~5000 of the 10,000 clients retry; the rest are spread across t=0 to t=1. The recovering service handles a more even load and stays up.

AWS's 2015 architecture blog post "Exponential Backoff and Jitter" by Marc Brooker codified three jitter variants:

  • Full jitter. sleep = random(0, base × 2^n). Most aggressive spreading. Each client's delay is uniformly distributed.
  • Equal jitter. sleep = base × 2^(n-1) + random(0, base × 2^(n-1)). At least half the computed delay, plus random for the rest. Guarantees clients wait at least some time.
  • Decorrelated jitter. sleep = min(cap, random(base, previous_sleep × 3)). Uses the previous delay as input rather than the attempt number. Empirically smoother throughput in AWS's load tests.

The AWS recommendation: use full jitter for most workloads — it's the simplest and spreads the herd most aggressively. The marginal improvement of decorrelated over full is small in practice.

Worked example — 1000 clients hitting a transient outage

A backend service is briefly down at t=0. 1000 clients are mid-call; all fail. Each retries up to 4 times with backoff.

No jitter (synchronized retries):
  t = 1 s:  ALL 1000 retry simultaneously     → 1000 RPS spike
  t = 3 s:  ALL 1000 retry simultaneously     → 1000 RPS spike
  t = 7 s:  ALL 1000 retry simultaneously     → 1000 RPS spike
  t = 15 s: ALL 1000 retry simultaneously     → 1000 RPS spike

  Backend keeps falling under 1000 RPS spikes; recovery takes minutes.

Full jitter (each waits random 0 to base × 2^n):
  t = 0-1 s:   ~1000 retries spread uniformly → ~1000 / 1 s = 1000 RPS avg, but smooth
  t = 1-3 s:   ~remaining 500 retries spread  → ~250 RPS avg
  t = 3-7 s:   ~remaining 200 retries spread  →  ~50 RPS avg
  t = 7-15 s:  ~remaining 50 retries spread   →   ~6 RPS avg

  Backend handles smooth load, stays up, all clients eventually succeed.

The math: with full jitter, expected request count over time falls roughly as N × (1 - sigma_t)/(2^k) where k is the retry round and sigma_t is the fraction of round k's window already elapsed. This monotonic decline gives the downstream a gradual recovery window, not a hammering wave.

Idempotency — retry's prerequisite

A retry that re-runs a successful operation is sometimes worse than not retrying at all. Classic case: payment. Client sends "charge $100." Server processes it. Network drops the response. Client times out, retries. Now the customer is charged $200.

The solution: idempotency. An operation is idempotent if running it N times has the same effect as running it once.

  • GETs are idempotent by HTTP definition. Retry freely.
  • PUTs are idempotent by HTTP definition. PUT /user/42 {"name":"Alice"} twice == once.
  • DELETEs are idempotent by HTTP definition. Deleting the same resource twice is no-op the second time.
  • POSTs are NOT idempotent. Retrying a POST that may or may not have succeeded on the server can produce duplicates.

To make POST retryable, send an Idempotency-Key header — a UUID generated by the client. Stripe popularized this pattern: Idempotency-Key: 7e3a... . The server stores (key, result) in a TTL cache; if the same key arrives again, return the stored result without re-running the operation. Stripe's docs are the canonical reference.

Variants

  • Pure exponential backoff (no jitter). delay = base × 2^n. Simple, but produces thundering herd. Only safe with single-client workloads.
  • Full jitter. delay = random(0, base × 2^n). The AWS recommendation. Maximum spreading.
  • Equal jitter. delay = base × 2^(n-1) + random(0, base × 2^(n-1)). Half deterministic, half random.
  • Decorrelated jitter. delay = min(cap, random(base, previous_delay × 3)). Smoother throughput in load tests.
  • Capped backoff. Add delay = min(delay, max_delay) so the wait never exceeds a ceiling (typically 30-60 seconds).
  • Truncated exponential backoff. Use exponential for the first few retries, then constant delay. Common in throttle-respecting clients.
  • Honor Retry-After header. When the server sends HTTP 429 or 503 with a Retry-After header, use that value directly — it's the authoritative answer for "when will I succeed."
  • Hedged requests. Instead of waiting for failure, send a duplicate request after a short delay if the first hasn't returned. gRPC supports this natively; useful for tail-latency reduction.

Backoff strategy comparison

StrategyFormulaUse when
Fixed delaydelay = constantSingle client, no scale concern
Linear backoffdelay = base × nPredictable per-attempt cost
Pure exponentialdelay = base × 2^nNever in production (thundering herd)
Full jitterrandom(0, base × 2^n)Default for client-side retry
Equal jitterhalf × 2^(n-1) + rand(half)When some minimum delay is required
Decorrelated jitterrandom(base, prev × 3)Heavy load, smooth throughput goal
Respect Retry-Afterserver-toldAlways, if header present

AWS SDKs (boto3, aws-sdk-go, aws-sdk-java) default to full jitter with a base of 100 ms and 3 retries. gRPC uses equal jitter by default. Most HTTP libraries (Python urllib3, Java's Apache HttpClient, Node axios-retry) ship with full jitter as the recommended preset.

Python implementation

import random, time, requests

def retry_with_backoff(func, *,
                       max_attempts=4,
                       base_delay=1.0,
                       max_delay=30.0,
                       retryable_errors=(requests.ConnectionError, requests.Timeout)):
    for attempt in range(max_attempts):
        try:
            return func()
        except retryable_errors as e:
            if attempt == max_attempts - 1:
                raise
            # Full jitter
            cap = min(max_delay, base_delay * (2 ** attempt))
            delay = random.uniform(0, cap)
            time.sleep(delay)

def fetch_with_retry(url):
    return retry_with_backoff(
        lambda: requests.get(url, timeout=5).raise_for_status(),
        max_attempts=4,
        base_delay=1.0
    )

# Or as a decorator:
def retryable(max_attempts=4, base_delay=1.0):
    def decorator(func):
        def wrapper(*args, **kwargs):
            return retry_with_backoff(
                lambda: func(*args, **kwargs),
                max_attempts=max_attempts,
                base_delay=base_delay
            )
        return wrapper
    return decorator

@retryable(max_attempts=4, base_delay=1.0)
def charge_card(invoice):
    return requests.post("/api/charge", json=invoice, headers={
        "Idempotency-Key": str(uuid.uuid4())
    }).json()

Java implementation (Resilience4j)

import io.github.resilience4j.retry.*;
import java.time.Duration;

RetryConfig config = RetryConfig.<Response>custom()
    .maxAttempts(4)
    .intervalFunction(IntervalFunction.ofExponentialRandomBackoff(
        Duration.ofSeconds(1),     // initial delay
        2.0,                       // multiplier (doubling)
        0.5                        // randomization factor (full jitter)
    ))
    .retryOnException(e ->
        e instanceof IOException ||
        (e instanceof HttpResponseException &&
            ((HttpResponseException) e).code() >= 500))
    .build();

Retry retry = Retry.of("paymentService", config);

Supplier<Response> decorated = Retry.decorateSupplier(
    retry, () -> paymentClient.charge(invoice));

Response r = decorated.get();

Common pitfalls

  • Retrying non-idempotent operations. Without an Idempotency-Key or equivalent, retrying a POST can double-charge, duplicate-create, or trigger workflows twice. Either use idempotent verbs (PUT instead of POST), send an Idempotency-Key, or accept that some POSTs cannot be retried.
  • Retrying everything that looks like an error. A 404 is not a transient error — retrying won't make the resource exist. A 401 won't change unless you refresh credentials. Filter retryable errors explicitly.
  • No max-retry cap. An open-ended retry loop on a permanently-failing operation pegs CPU and holds resources. Always set a hard cap (3-5 attempts).
  • No jitter. The single most common production bug in retry code. Without jitter, you have a synchronous thundering herd waiting to happen. Always add jitter when more than one client retries the same downstream.
  • Ignoring Retry-After. When a server explicitly says "retry after 30 seconds," retrying after 1 second wastes everyone's time and may trigger rate limits. Use Retry-After when present.
  • Retry inside a retry inside a retry. Nested retries multiply. A handler that retries 3 times, each call wrapped in a client that retries 3 times, makes 9 attempts and 9 × max-backoff wait time. Verify the total amplification factor end to end.
  • Retry without a circuit breaker. If a downstream stays sick for an hour, every retry round runs all attempts, hits all the backoff waits, and eventually fails — burning resources the whole time. Pair retries with a circuit breaker that trips OPEN after enough sustained failures.
  • Counting against rate limits. Each retry consumes a quota request. With 5 retries and a 1000 req/hour limit, a single hot loop client can burn its full hour budget on one logical operation. Make sure retry cost is accounted for in your rate-limit budget.

Performance and impact

The CPU cost of a retry loop is negligible — a few microseconds of bookkeeping per attempt. The real cost is wall-clock time: 1+2+4+8 = 15 seconds for a 4-retry exhaustion means a request that ultimately fails takes 15 seconds from the user's perspective. That's bad UX. For user-facing operations, set timeouts aggressively (200-500 ms) and cap retries at 2-3. For background batch jobs, longer retry budgets (20-60 seconds) are fine.

The thundering-herd impact at scale is dramatic. AWS's own DynamoDB team measured a 5x reduction in recovery time during simulated partial outages when client SDKs switched from no-jitter to full-jitter retry. Stripe's published reliability post-mortems repeatedly cite jittered retry on their internal services as the reason brief outages don't cascade into multi-minute downtimes. Google's SRE book devotes a chapter to the same lesson — retries without jitter were responsible for amplifying the 2009 App Engine "datastore latency" incident into a 4-hour outage.

Combine retry with the rest of the resilience stack — bulkhead (resource isolation), timeout (per-call cap), circuit breaker (sustained-failure detection), and idempotency (safe-to-repeat) — and you get a network client that handles brief failures gracefully without amplifying load when failures persist. The defaults are well-tested: AWS SDK's 3-retry, full-jitter, 100 ms-base preset is what most production services should start with.

Frequently asked questions

What is exponential backoff in one sentence?

On each successive retry of a failed call, double the delay before the next attempt — so retry attempts go at 1, 2, 4, 8, 16 seconds rather than blindly hammering the downstream every millisecond. With jitter added, many clients hitting the same problem don't all retry at exactly the same moment.

Why is jitter necessary?

Without jitter, many clients that all failed at the same instant (because the downstream went briefly unavailable) all wait exactly 1 second, retry simultaneously, fail again, wait exactly 2 seconds, retry simultaneously — this is the thundering herd, and it overwhelms the recovering downstream every time. Adding random jitter — a random factor between 0 and the computed delay — spreads retry attempts across the recovery window, smoothing the load.

What's the difference between full jitter, equal jitter, and decorrelated jitter?

Full jitter: sleep = random(0, base × 2^attempt). Most aggressive spreading, simplest. Equal jitter: sleep = base × 2^attempt / 2 + random(0, base × 2^attempt / 2) — at least half the backoff is guaranteed, plus random for the rest. Decorrelated jitter (AWS Architecture Blog): sleep = min(cap, random(base, previous_sleep × 3)) — uses the previous delay as input rather than the attempt number, smoother distribution and better-shaped throughput in load tests. AWS recommends full jitter for most cases.

What should I cap the maximum retry count at?

AWS SDK default is 3 retries (4 total attempts). gRPC default is 5 retries with hedging. HTTP clients are commonly 3. Beyond 4-5 retries, the diminishing returns are sharp — if a downstream is still failing after 30 seconds of waiting, it's likely a sustained outage that retries won't fix. Pair retries with a circuit breaker so persistent failures stop hitting the downstream at all.

Which errors should be retried and which should not?

Retry transient failures: connection refused, timeout, HTTP 5xx (especially 503 Service Unavailable, 504 Gateway Timeout), HTTP 429 Too Many Requests (respect Retry-After header). Do NOT retry HTTP 4xx that indicate a client problem (400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found) — they won't succeed on retry. Special case: HTTP 408 Request Timeout and 425 Too Early may be retryable. Always require the operation to be idempotent before retrying.

What is idempotency and why does retry need it?

An operation is idempotent if running it twice has the same effect as running it once. GET requests are idempotent (reading data twice gives the same result). POST requests are typically NOT idempotent (charging a card twice charges twice). Without idempotency, retrying a POST that timed out — when the server actually succeeded but the response was lost — could double-charge. The fix: send an Idempotency-Key header (Stripe), or use HTTP PUT instead of POST (PUT is defined as idempotent), or use exactly-once message processing with deduplication.

How does retry interact with a circuit breaker?

Retry handles transient failures — brief blips. Circuit breaker handles sustained failures — a downstream that's persistently sick. They compose: retry inside the breaker means transient blips are retried, but if even retried calls keep failing, the breaker trips and stops calling at all. Without a breaker, persistent failures cause every request to take retry_count × max_backoff seconds, which is much worse than failing fast. Production stack: timeout → retry-with-backoff → circuit breaker, wrapped from inside out.