Systems
Mutex vs Semaphore
One says "I own this room." The other counts who's allowed in.
A mutex enforces mutual exclusion: one owner, one critical section, locked and unlocked by the same thread. A semaphore is a signed counter — threads decrement to enter and increment to leave, with no concept of ownership. They look superficially similar, but priority inheritance, error checking, and signalling semantics make them tools for different jobs.
- Mutex (uncontended)~25 ns
- Mutex (contended, futex wake)~1–10 µs
- Semaphore counter range0 to N
- Mutex has ownerYes
- Semaphore has ownerNo
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The fundamental difference
A mutex ("mutual exclusion") is a lock with an owner. When a thread lock()s it, the kernel records that thread as the holder. Only that thread may unlock(). Anyone else trying to unlock it is a bug — and on POSIX, PTHREAD_MUTEX_ERRORCHECK will return EPERM rather than corrupt state.
A semaphore is a non-negative integer with two atomic operations: wait (also called P or down) decrements it, blocking if it's already zero, and post (V or up) increments it, waking one waiter. There is no owner. Thread A can wait, thread B can post — that asymmetry is the whole point.
So the mental model is:
- Mutex: I'm entering a room nobody else may enter. I will leave the same way I came in.
- Semaphore (counting): There are N slots. I'm taking one. Someone — maybe me, maybe not — will return one later.
- Semaphore (binary, signalling): A doorbell. Someone rings, someone hears.
Mutex, semaphore, and friends
| Primitive | Owner? | Counter | Reentrant? | Typical use | Cost (uncontended) |
|---|---|---|---|---|---|
| Mutex | Yes | 0 / 1 | Optional (recursive) | Protect a critical section | ~25 ns |
| Binary semaphore | No | 0 / 1 | No | Signal between threads | ~30 ns |
| Counting semaphore | No | 0 to N | No | Bounded resource pool | ~30 ns |
| Spinlock | Implicit | 0 / 1 | No | Very short crit. sections, no sleeping (kernel) | ~10 ns + spin |
| Read-write lock | Yes (writer) | readers + writer flag | Often no | Many readers, rare writers | ~60 ns |
| Condition variable | No (paired with mutex) | — | — | Wait for predicate to become true | ~50 ns + wake |
| Futex (Linux primitive) | No | — | — | Building block under all of the above | ~25 ns fast path |
Most modern mutexes on Linux are implemented in userspace with a single atomic on the fast path; only on contention do they fall through to a futex(2) syscall. That's why uncontended locks measure tens of nanoseconds and contended locks measure microseconds — you've crossed the kernel boundary.
Producer-consumer, the classic two-semaphores-and-a-mutex pattern
The bounded-buffer producer-consumer is the textbook case where both primitives are needed at once. Two counting semaphores track filled and empty slots; one mutex protects the buffer's internal pointers.
// C / pthreads — the canonical version
#define N 16
int buf[N];
int in = 0, out = 0;
sem_t empty; // counts empty slots, init = N
sem_t filled; // counts filled slots, init = 0
pthread_mutex_t mu; // protects in/out
void* producer(void* _) {
for (int v = 0; ; v++) {
sem_wait(&empty); // block if buffer full
pthread_mutex_lock(&mu);
buf[in] = v; in = (in + 1) % N;
pthread_mutex_unlock(&mu);
sem_post(&filled); // wake a consumer
}
}
void* consumer(void* _) {
for (;;) {
sem_wait(&filled); // block if buffer empty
pthread_mutex_lock(&mu);
int v = buf[out]; out = (out + 1) % N;
pthread_mutex_unlock(&mu);
sem_post(&empty); // free a slot
process(v);
}
}
The order matters. sem_wait(&empty) must happen before the mutex; otherwise a producer holding the mutex while waiting on a full buffer would deadlock every consumer trying to lock the same mutex to drain it.
The same idea in Python — using threading.Semaphore (counting) and threading.Lock (mutex):
import threading, queue
N = 16
buf = [None] * N
in_idx = out_idx = 0
empty = threading.Semaphore(N) # counting
filled = threading.Semaphore(0)
mu = threading.Lock()
def producer():
global in_idx
v = 0
while True:
empty.acquire()
with mu:
buf[in_idx] = v
in_idx = (in_idx + 1) % N
filled.release()
v += 1
def consumer():
global out_idx
while True:
filled.acquire()
with mu:
v = buf[out_idx]
out_idx = (out_idx + 1) % N
empty.release()
process(v)
And in JavaScript (Node), where the only built-in primitives on shared SharedArrayBuffer memory are Atomics.wait/Atomics.notify — a futex by another name:
// Node Worker Threads — semaphore implemented atop Atomics
class Semaphore {
constructor(initial) {
this.buf = new SharedArrayBuffer(4);
this.view = new Int32Array(this.buf);
Atomics.store(this.view, 0, initial);
}
wait() {
while (true) {
const v = Atomics.load(this.view, 0);
if (v > 0 && Atomics.compareExchange(this.view, 0, v, v - 1) === v) return;
Atomics.wait(this.view, 0, 0); // sleep until non-zero
}
}
post() {
Atomics.add(this.view, 0, 1);
Atomics.notify(this.view, 0, 1);
}
}
Note the loop in wait(): after waking, the counter may have been stolen by another waiter, so we re-check. This is the classic spurious wakeup guard.
Real costs you should remember
- Uncontended mutex lock/unlock: ~25 ns on modern x86-64 Linux. A single CAS, no syscall.
- Contended mutex (one waiter): ~1–10 µs. The loser parks via
futex(FUTEX_WAIT), the winner wakes it viafutex(FUTEX_WAKE). That's a syscall plus at minimum a context switch back. - Hot contention (many waiters on one lock): latency degrades super-linearly. Cache-line ping-pong on the lock word costs ~100 ns per transfer; convoying can stretch tail latency to milliseconds.
- Semaphore wait/post: nearly identical to a mutex on Linux NPTL — both are futex-backed.
- Spinlock: ~10 ns when uncontended, but burns CPU on contention. Only sane if the critical section is shorter than a context switch (~1 µs).
Variants worth knowing
Mutex variants
- Recursive mutex (
PTHREAD_MUTEX_RECURSIVE) — the same thread can lock it N times and must unlock N times. Useful for callbacks that re-enter a locked API; usually a sign of a design problem. - Error-checking mutex (
PTHREAD_MUTEX_ERRORCHECK) — returnsEDEADLKon double-lock andEPERMon cross-thread unlock. Slower; great in tests. - Priority-inheritance (
PTHREAD_PRIO_INHERIT) — when a high-priority thread waits, the holder temporarily inherits its priority. The Mars Pathfinder rover famously rebooted itself in 1997 because of priority inversion in a non-PI mutex. - Adaptive mutex — spins briefly before parking, betting that the holder will release within a few hundred cycles. Glibc's default since 2009.
Semaphore variants
- POSIX unnamed (
sem_init) — process-local or shared memory. - POSIX named (
sem_open) — lives in/dev/shm, persists across process exit, ideal for cross-process signalling. - System V (
semget) — older, supports semaphore sets with batch operations and undo on process death.
Common bugs and edge cases
- Priority inversion. A medium-priority thread can starve a high-priority thread by preempting the low-priority lock holder. Solution: priority inheritance or priority ceiling protocols.
- Deadlock from inconsistent lock order. Thread A locks mu1 then mu2; thread B locks mu2 then mu1. Always lock in a globally consistent order — alphabetical, by address, by tier number.
- Forgotten unlock on exception. Use RAII (C++
std::lock_guard),defer(Go),with(Python), ortry/finally(Java/JS) — never raw lock/unlock pairs. - Posting a semaphore that nobody is waiting on. The post is "remembered" — the counter stays incremented. Posting a mutex from outside the holder is undefined behavior.
- Lost wakeup with condition variables. Always check the predicate in a
whileloop after wake-up; spurious wakes are real. - Holding a lock across a syscall. A blocked I/O syscall under a mutex turns every other thread into a hostage. Move the syscall outside the critical section, or use lock-free queues.
- Mutex destruction while held. Destroying a held mutex is undefined. Reference-count owners or use
shared_ptr/Arc.
A simple rule of thumb
If the same thread acquires and releases, and you're protecting state, reach for a mutex. If different threads operate on the primitive — especially if you're counting something or signalling across a boundary — reach for a semaphore. When you find yourself wanting "wait until this condition becomes true," that's a condition variable, paired with a mutex; not a clever semaphore hack.
Frequently asked questions
Is a binary semaphore the same as a mutex?
No. Both have two states, but a mutex has an owner — only the locking thread may unlock it. A binary semaphore has no owner, so any thread (or signal handler) can post it. That difference matters for priority inheritance, recursive locking, and error checking.
When should I use a semaphore instead of a mutex?
Use a semaphore when you're counting resources or signaling between threads — bounded buffer slots, a connection pool with N permits, or wake-up notification from one thread to another. Use a mutex when one thread enters a critical section and the same thread leaves it.
How expensive is locking a mutex?
An uncontended pthread mutex on Linux is roughly 25 nanoseconds — a single atomic compare-and-swap in the fast path, no syscall. A contended lock is dramatically slower: the loser sleeps via futex, costing one or more microseconds plus a context switch. Avoid contention, not locks.
What is priority inversion?
A low-priority thread holds a lock that a high-priority thread needs; meanwhile, a medium-priority thread runs and starves the low-priority holder. The high-priority thread waits indirectly on the medium one. Priority-inheritance mutexes (PTHREAD_PRIO_INHERIT) fix this by temporarily boosting the holder's priority.
Can I use a mutex across processes?
Only if it lives in shared memory and was initialized with PTHREAD_PROCESS_SHARED. Most mutexes are process-local. POSIX named semaphores (sem_open) are cross-process by design and are usually a simpler choice.
Why does a producer-consumer queue typically need both?
The mutex protects the queue's internal state during enqueue/dequeue. Two semaphores — one counting empty slots, one counting filled slots — let producers block when full and consumers block when empty without spinning. The mutex is short; the semaphores carry the wait.