Distributed Systems

Lamport Clocks

Each node has a counter; on send, increment + attach; on receive, max(local, received) + 1

A Lamport clock is a logical timestamp scheme that establishes a "happens-before" partial order across events in a distributed system without requiring synchronized physical clocks. Each process keeps a local counter L. Local event: L = L + 1. Send: L = L + 1, attach L to message. Receive: L = max(L, received) + 1. If event A's clock < event B's clock, A could have caused B (but the converse isn't guaranteed — to detect concurrent events, use vector clocks). Introduced by Leslie Lamport in 1978 ("Time, Clocks, and the Ordering of Events"), a foundational paper of distributed computing — established the happens-before relation and Lamport's bakery algorithm.

  • Typelogical scalar
  • EstablishedLamport 1978
  • Causal propertyA→B implies L(A) < L(B)
  • Cannot detectconcurrent events (use vector clocks)
  • Total orderextend with PID tiebreaker
  • Used inKafka, distributed locks

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why Lamport clocks matter

  • Ordering without synchronized clocks. Wall-clock time on different machines drifts by milliseconds — Lamport timestamps are exact, monotonic, and require no NTP.
  • Kafka offsets. Each partition's log offset is a Lamport-style monotonic counter that orders messages within a partition, even if producers run on machines with different clocks.
  • Distributed locks. Lamport's 1974 bakery algorithm and modern Paxos/Raft term-index pairs are direct descendants — a logical timestamp settles "who got the lock first."
  • Total order broadcast. Adding a tiebreaker (PID, IP, process UUID) to a Lamport timestamp produces a total order — every node agrees on the same sequence of events.
  • Causal consistency. Lamport timestamps are the cheapest way to enforce "if A caused B, A is delivered before B everywhere."
  • Foundational for vector clocks, HLC, and consensus. Almost every distributed-systems concept built after 1978 references Lamport's framework.
  • Hybrid logical clocks (HLC). CockroachDB and MongoDB use HLC — physical time + logical counter — to combine Lamport's correctness with human-readable timestamps.

The three update rules

A Lamport clock is just an integer per process. Three rules govern how it updates:

  1. Local event. Before any local event (computation, file write, internal step): L = L + 1.
  2. Send. Before sending a message: L = L + 1, then piggyback L on the message.
  3. Receive. When receiving a message with timestamp T: L = max(L, T) + 1, then process the event.

That's the entire algorithm. Three rules, two lines of code per rule, no synchronization between processes required.

The clock condition follows directly: if A → B (A happens-before B), then L(A) < L(B). Proof by cases on the three rules of happens-before. Within a process, the +1 on each event guarantees increasing timestamps. Across processes, the receive rule's max ensures the receiver's clock leapfrogs the sender's. Transitivity follows from arithmetic.

Worked example

Consider three processes P1, P2, P3 each starting at L=0.

  • P1 local event a → L1 = 1.
  • P1 sends message m1 to P2 → L1 = 2, m1 carries timestamp 2.
  • P2 local event b → L2 = 1 (independent of P1).
  • P2 receives m1 with T=2 → L2 = max(1, 2) + 1 = 3.
  • P2 sends m2 to P3 → L2 = 4, m2 carries timestamp 4.
  • P3 local event c → L3 = 1.
  • P3 receives m2 with T=4 → L3 = max(1, 4) + 1 = 5.

Causal chain: a (1) → m1 (2) → b's receive (3) → m2 (4) → c's receive (5). The timestamps are strictly increasing along the chain. Note that b at L2=1 and c at L3=1 are concurrent (no causal path between them), yet have the same timestamp — that's the well-known limitation of Lamport scalar clocks.

Lamport vs vector clocks

Vector clocks (Fidge 1988, Mattern 1989) generalize Lamport clocks to N integers per process — one entry per node. The update rules:

  • Local event: V[self] = V[self] + 1.
  • Send: V[self] = V[self] + 1, attach full vector V to message.
  • Receive: for each i, V[i] = max(V[i], received[i]); then V[self] = V[self] + 1.

The key gain: vector clocks satisfy the strong clock condition — A → B if and only if V(A) < V(B), where < is element-wise less-than-or-equal with at least one strict less-than. Concurrent events have incomparable vectors (each has a strictly greater entry somewhere). Cost: O(N) per timestamp instead of O(1).

Riak's bucket types tracked vector clocks per object until 2014; modern Riak uses dotted version vectors (a more compact variant). DynamoDB's earlier API exposed vector clocks; later versions hide them behind LWW (last-writer-wins) using physical timestamps.

Building total order from partial order

Lamport clocks give a partial order; sometimes you need a total order (e.g., to feed a state machine deterministically). Standard trick: append the process ID (or any deterministic tiebreaker) to the timestamp.

Define total order <T as: (L1, P1) <T (L2, P2) iff L1 < L2, or (L1 = L2 and P1 < P2). This is what Lamport's 1978 paper proposed for the distributed mutual exclusion algorithm.

Modern incarnations include Cassandra's writetime (timestamp + node-ID), Kafka's (offset, partition) pair within a topic, and Spanner's TrueTime (interval + leader-ID for tiebreaking after waitout).

Hybrid logical clocks

HLC (Kulkarni, Demirbas, Madappa, Avva, Leone, 2014) merges wall-clock time and Lamport logic into a single 64-bit timestamp. Encoding: high 48 bits are physical milliseconds; low 16 bits are a logical counter.

Update rules at process p (current physical time pt; current HLC l, c):

  1. Send / local event: l' = max(l, pt). If l' = l, c++; else c = 0. New HLC = (l', c).
  2. Receive (l_msg, c_msg): l' = max(l, l_msg, pt). If l' = l = l_msg, c = max(c, c_msg) + 1. If l' = l, c++. If l' = l_msg, c = c_msg + 1. Else c = 0.

Properties: HLC is always within a few milliseconds of wall-clock time (so dashboards show sensible timestamps), preserves causality (A → B implies HLC(A) < HLC(B)), and is bounded — the logical part doesn't grow unboundedly during clock skew. CockroachDB, YugabyteDB, MongoDB 3.6+, and many Raft implementations use HLC for transactional read timestamps.

Kafka offsets as Lamport clocks

Each Kafka partition has a single producer-of-record (the partition leader). Producers append messages and the leader assigns a strictly-monotonic 64-bit offset. This offset functions exactly like a per-partition Lamport clock — every consumer sees messages in offset order, regardless of producer wall-clock skew.

Cross-partition ordering is not guaranteed. To order across partitions, you'd need a global Lamport (or vector) clock — Kafka doesn't provide one because the cost would defeat the parallelism that makes it fast (a million messages/sec on commodity hardware). Some applications layer a transactional ID on top to coordinate across partitions, accepting the throughput tradeoff.

Common misconceptions

  • "Lamport timestamps are wall-clock times." Purely logical integers — they have no relation to seconds, minutes, or NTP-synchronized clocks.
  • "L(A) < L(B) implies A → B." Only the converse is guaranteed. Concurrent events can have ordered timestamps purely by coincidence.
  • "Lamport clocks need synchronized physical clocks." Opposite — they exist precisely to avoid that requirement.
  • "Vector clocks are always better." O(N) bytes per timestamp doesn't scale to thousands of nodes; for total order with tiebreaker, scalar Lamport is often enough.
  • "Lamport's 1978 paper is just about clocks." The paper also introduced the bakery algorithm, distributed mutual exclusion, and the foundation of state machine replication — one of the most cited papers in CS.
  • "Skip the +1 on receive." Without +1, the receive event tied with the send event in timestamp — strict less-than fails. The +1 is what makes the clock condition work.
  • "Lamport clocks solve consensus." They provide ordering, not agreement. Consensus needs a quorum protocol (Paxos, Raft) on top of timestamps for tiebreaking.

Real-world implementations

  • Kafka: 64-bit offset per partition; tens of millions of timestamps assigned per second per broker.
  • etcd: Raft term + index pair acts as a Lamport timestamp for every key change.
  • CockroachDB HLC: 64-bit (48-bit physical ms + 16-bit logical counter); read timestamps coordinate distributed transactions.
  • MongoDB 3.6+: ClusterTime is HLC; used for change streams, causal consistency sessions.
  • Spanner: TrueTime intervals (not pure Lamport, but builds on the same partial-order foundation with bounded uncertainty).
  • Cassandra: writetime is microsecond physical time + node-ID — works because nodes use NTP plus a node-ID tiebreaker.

Frequently asked questions

What is the happens-before relation?

Lamport defined happens-before (denoted →) as the smallest relation satisfying three rules. (1) If A and B occur in the same process and A precedes B, then A → B. (2) If A is a send and B is the matching receive, then A → B. (3) Transitivity: if A → B and B → C, then A → C. If neither A → B nor B → A, the events are concurrent (denoted A || B). Happens-before is a partial order, not a total order — concurrent events have no defined ordering, which is exactly the point: they couldn't have caused each other.

Why can't Lamport clocks detect concurrent events?

Lamport clocks compress causality into a single integer. The clock condition guarantees A → B implies L(A) < L(B). The converse fails: L(A) < L(B) does not imply A → B because two concurrent events on different processes can have ordered timestamps purely by coincidence. Process P1 increments to L=5; concurrently P2 increments to L=7 with no communication. L(P1)=5 < L(P2)=7 but the events are concurrent. To detect concurrency you need vector clocks, where each process tracks every other's counter.

How does Lamport clock differ from a vector clock?

Lamport: one integer per node. Cheap (O(1) bytes per message), but only consistent — A → B implies L(A) < L(B). Vector: array of N integers, one per node. Expensive (O(N) bytes), but strongly consistent — A → B iff V(A) < V(B). Vectors detect concurrency by comparing element-wise (V1[i] ≤ V2[i] for all i with at least one strict). Use Lamport for total ordering and tie-breaking; use vector clocks when you need to detect and reconcile concurrent updates (Dynamo, Riak).

What is a hybrid logical clock (HLC)?

HLC (Kulkarni et al., 2014) combines wall-clock time with a Lamport-style logical counter into one 64-bit timestamp. Most bits are physical milliseconds; lowest bits are a logical counter that increments when wall-clock didn't advance between events. HLC stays close to physical time (so values are human-readable timestamps), survives skewed clocks (because the logical counter takes over during clock drift), and preserves causality. CockroachDB, MongoDB 3.6+, and YugabyteDB use HLC for transactional ordering.

Where do production systems use Lamport clocks?

Kafka uses log offsets, which are per-partition Lamport clocks. Distributed lock servers (etcd, Zookeeper) use ZXID/Raft-term-and-index pairs that behave like Lamport timestamps. Paxos proposal numbers and Raft (term, index) tuples are Lamport-style — they break ties via PID/server-ID. Riak and Cassandra use Lamport-derived schemes for tombstone ordering. Even Git's commit graph is conceptually a vector-clock-on-DAG, with Lamport-style depth as a fallback total order.

Why is the +1 increment essential?

Without +1, two events at the same process or two send-receive pairs could collide on the same timestamp, breaking strict ordering. The receive rule L = max(L, received) + 1 ensures the receive event has a strictly greater timestamp than both the send and the prior local state. The send-then-receive sequence forms a strict chain. Skip the +1 and the relation A → B implies L(A) ≤ L(B) instead of <, which loses the ability to distinguish causally-ordered events from same-clock events.