Systems

Deadlock

Two threads waiting for each other forever — the four conditions and how to break them

A deadlock happens when two or more threads each hold a resource the others need and are waiting for the others' resources — a circular wait where no one can make progress. Coffman's four conditions (mutual exclusion, hold and wait, no preemption, circular wait) all must hold simultaneously. Break any one and you prevent deadlock; the standard "lock ordering" trick breaks circular wait.

  • Required for deadlockAll 4 Coffman conditions hold simultaneously
  • DetectionWait-for graph cycle search — O(V + E)
  • Standard preventionLock ordering — always acquire in the same global order
  • Modern alternativeLock-free / wait-free algorithms via CAS atomics
  • Deadlock vs livelockDeadlock = stuck; livelock = active but no progress
  • Famous exampleDining philosophers (5 thinkers, 5 forks, all grab left first)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How a deadlock forms — Coffman's four conditions

Deadlock requires all four conditions to hold simultaneously:

  1. Mutual exclusion. The resources can be held in a non-shareable mode. Two threads can't simultaneously hold the same exclusive lock.
  2. Hold and wait. Threads can hold one resource while requesting another. They don't release everything to grab a single lock.
  3. No preemption. Resources are released only voluntarily. The system can't yank a lock away from a thread (unlike CPU scheduling, which can preempt).
  4. Circular wait. There's a cycle: thread A waits for B's resource, B waits for C's, ..., Z waits for A's.

Eliminating any single condition prevents deadlock. In practice, the easiest target is circular wait — that's why "lock ordering" is the dominant prevention strategy.

The dining philosophers problem

Five philosophers sit around a circular table. Each needs both adjacent forks to eat. There are 5 forks total — one between each pair.

// Naive — every philosopher does this:
acquire(left_fork);
acquire(right_fork);
eat();
release(right_fork);
release(left_fork);

If all five execute the first acquire simultaneously, each holds their left fork and waits for their right (which is the next philosopher's left). Circular wait. Deadlock.

Five solutions:

  1. Lock ordering. Number forks 1-5; require each philosopher to acquire the lower-numbered fork first. The 5th philosopher (between fork 5 and fork 1) acquires fork 1 first — breaks the circular pattern.
  2. Asymmetric philosophers. Most acquire left then right; one acquires right then left. Same effect as lock ordering.
  3. Resource hierarchy. Both forks must be acquired atomically (one syscall, all-or-nothing). Eliminates hold-and-wait.
  4. Referee. A waiter (semaphore initialized to 4) limits how many philosophers can try eating simultaneously. With at most 4 trying, at least one always has both forks available.
  5. Optimistic with rollback. Acquire left, try acquire right. If right unavailable, release left and retry after a random delay (livelock-resistant via randomization).

Prevention strategies — break each condition

Condition brokenStrategyReal-world example
Mutual exclusionUse shared resources (read/write locks, lock-free data structures)RCU in the Linux kernel; concurrent hash maps
Hold and waitAcquire all needed locks atomically; release-and-retry on failureDatabase two-phase locking with strict acquire-all-then-execute
No preemptionAllow timeouts; abort + restart on lock contentionDatabase deadlock detector kills victim transaction
Circular waitLock ordering — acquire in a globally consistent orderLinux lockdep validates this in kernel; standard practice in production code

Lock ordering is the most popular because it's verifiable statically (lockdep), preserves performance (no extra coordination), and doesn't require giving up exclusive locks.

Detecting deadlock at runtime

Build a wait-for graph — directed graph with one vertex per thread, edge from A to B if A is waiting for a lock held by B. Run cycle detection (DFS with three-color marking). Any cycle is a deadlock.

function detectDeadlock(waitForGraph) {
  const WHITE = 0, GRAY = 1, BLACK = 2;
  const color = new Map();
  for (const t of Object.keys(waitForGraph)) color.set(t, WHITE);

  function dfs(t) {
    color.set(t, GRAY);
    for (const blocker of waitForGraph[t] || []) {
      if (color.get(blocker) === GRAY) return [t, blocker]; // cycle
      if (color.get(blocker) === WHITE) {
        const result = dfs(blocker);
        if (result) return result;
      }
    }
    color.set(t, BLACK);
    return null;
  }

  for (const t of color.keys()) {
    if (color.get(t) === WHITE) {
      const cycle = dfs(t);
      if (cycle) return cycle;
    }
  }
  return null;
}

Databases (PostgreSQL, MySQL InnoDB) run this periodically. On detection, kill the youngest or least-work-done transaction. Linux's lockdep does similar analysis in the kernel for kernel-level locks; reports any cycle as a CONFIG_PROVE_LOCKING warning at boot or runtime.

When deadlock typically strikes

  • Database transactions with mixed lock orders. Two transactions update the same rows in different orders → wait-for cycle. The DB detects, kills one. Application retries. Fine in moderate doses; horrible at high contention.
  • Multi-resource workflows. Code that acquires file lock, then DB lock, then network lock — versus other code path that acquires them in different order. Lock-ordering audits in code review prevent this.
  • Producer-consumer with bounded buffers. Producer holds source lock waiting for buffer space; consumer holds buffer space waiting to acquire source. Use condition variables and proper signaling.
  • Recursive locks held across callbacks. Thread A holds a lock and calls a function that asynchronously calls back into code that wants the same lock. Re-entrant mutexes help; better solution is to release the lock before invoking callbacks.
  • Cross-process IPC with shared resources. Two processes that both lock files, message queues, semaphores. Deadlock detection often missing across process boundaries.

JavaScript: a deadlock-prone pattern (with mutex library)

// Using async-mutex library — typical deadlock scenario
const m1 = new Mutex();
const m2 = new Mutex();

// Thread A:
await m1.runExclusive(async () => {
  await m2.runExclusive(async () => { /* work */ });
});

// Thread B:
await m2.runExclusive(async () => {
  await m1.runExclusive(async () => { /* work */ });
});

// If both run simultaneously:
//   A holds m1, waits for m2
//   B holds m2, waits for m1
// Circular wait — deadlocked forever.

// Fix — establish a global order. Always acquire mutexes in (m1, m2) order:
// Thread A and Thread B both:
await m1.runExclusive(async () => {
  await m2.runExclusive(async () => { /* work */ });
});

Modern alternatives to traditional locking

  • Lock-free data structures (CAS-based). No mutex; use compare-and-swap atomics for state changes. Concurrent queues, stacks, hash maps. ConcurrentLinkedQueue (Java), crossbeam (Rust). Deadlock-free by construction.
  • Software transactional memory (STM). Wrap critical sections in transactions; runtime detects conflicts and rolls back losers. Clojure refs, Haskell STM. Popular in functional languages.
  • Actor model. Each actor processes messages serially; no shared state means no locks. Erlang, Akka, Pony. Deadlock can still happen at the message-passing level (deadlock by waiting for a reply that depends on us replying first), but conventional locking deadlocks vanish.
  • RCU (Read-Copy-Update). Readers proceed without locks; writers create a new version, atomically swap pointer. Used heavily in the Linux kernel. Super-fast reads; complex writes; relies on grace periods to free old data.

Common deadlock-related bugs

  • Inconsistent lock acquisition order across the codebase. Even one path that acquires (B, A) when everyone else does (A, B) can create deadlock. Static analysis tools (Coverity, lockdep) help; manual code review is still essential.
  • Holding a lock during long operations. Async I/O, network calls, slow disk reads — these increase the deadlock window dramatically. Refactor to release the lock before slow operations; re-acquire if needed.
  • Calling user code under a lock. If your library holds a lock and invokes a user-supplied callback, the callback might re-enter your library and try to acquire the same lock. Either use re-entrant mutexes (and accept the complexity) or ensure callbacks are invoked outside the lock.
  • Deadlock vs livelock confusion. Both block progress. Deadlock — threads stop. Livelock — they're active but spinning. Symptoms differ; diagnosis differs (live=high CPU, dead=zero CPU).
  • Convoy effect — non-deadlock but related. Many threads waiting on a lock held by a slow operation. Throughput collapses even without deadlock. Often confused with deadlock; lockdep traces show what's actually held.
  • Trying to detect deadlock in user code. Mostly futile in JS, Python, Java without specific runtime instrumentation. Better — use thread dumps in production (jstack, py-spy), look for threads blocked on locks held by other blocked threads.

Frequently asked questions

What are the four Coffman conditions?

All four must hold simultaneously for deadlock to occur. (1) Mutual exclusion — at least one resource is held in non-shareable mode (a mutex, not a read-only file). (2) Hold and wait — a thread holds at least one resource while waiting for additional ones. (3) No preemption — resources can't be forcibly taken; must be voluntarily released. (4) Circular wait — there's a cycle of threads each waiting for a resource held by the next. Eliminate any one condition and you prevent deadlock.

What's the dining philosophers problem?

Five philosophers sit around a circular table. Each needs two forks (one from the left, one from the right) to eat. There are only 5 forks total. If all philosophers simultaneously pick up their left fork and then wait for the right, everyone waits forever — circular wait deadlock. Solutions include lock ordering (always pick up the lower-numbered fork first), resource hierarchy, asymmetric philosophers (one always grabs right first), or a referee that allows at most 4 to try eating at once.

How does lock ordering prevent deadlock?

Number all locks globally; require every thread to acquire locks in increasing order. Cycles in the wait-for graph become impossible — if thread A holds lock 1 and waits for lock 5, no other thread can hold lock 5 and wait for anything ≤ 5 (because that thread must have acquired lock 5 before any others). Breaks the circular-wait condition. Standard practice in production code; static analysis can verify ordering at compile time.

What's the difference between deadlock and livelock?

Deadlock — threads stop, blocked forever, no CPU activity. Livelock — threads keep running but make no progress. Two threads each say "after you" forever — both active, neither advancing. Livelock often arises from naive "back off and retry" strategies where everyone backs off in the same way and re-races. Solution — randomized backoff (jittered retry intervals).

How do databases handle deadlock?

Detection plus rollback. The database builds a wait-for graph (which transaction is waiting on which lock); periodically (or on each new lock request) it checks for cycles. On detecting one, it picks a "victim" (often the youngest transaction or the one with least work done), aborts it, releases its locks, and lets the others continue. The victim's client gets a deadlock error and retries. PostgreSQL, MySQL InnoDB, Oracle all do this.

Are lock-free algorithms always deadlock-free?

Yes by definition. Lock-free means at least one thread always makes progress. Wait-free is even stronger — every thread makes progress in bounded time. Lock-free algorithms use CAS (compare-and-swap) primitives instead of mutexes; the CAS retry loop ensures progress. Trade-off — lock-free code is significantly harder to write and verify than lock-based code; ABA problem is the classic gotcha.

How do you detect a deadlock at runtime?

Build a wait-for graph — vertices are threads, edges are "thread A is waiting for resource held by thread B." Cycle detection (DFS) reveals deadlocks. Java's jstack and ThreadMXBean.findDeadlockedThreads() do this for the JVM. Linux's lockdep tracks lock acquisition order in the kernel and reports any inconsistency.