Distributed Systems

Snowflake ID

64 bits, three fields, decentralized, time-sortable

A snowflake ID is a 64-bit integer assembled from a 41-bit timestamp, a 10-bit machine ID, and a 12-bit sequence counter — generated locally, sortable by time, and produced at 4096 IDs per millisecond per machine. Twitter introduced the format in 2010, and it begat ULID, KSUID, and UUIDv7.

  • Total size64 bits (one int64)
  • Bit layout1 sign + 41 ts + 10 machine + 12 seq
  • Twitter epoch2010-11-04T01:42:54.657Z
  • Per-machine throughput4,096,000 IDs/sec
  • Cluster capacity1,024 machines × 4096 = 4.1B IDs/sec
  • Lifespan from epoch~69.7 years (2^41 ms)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The 64-bit layout

A snowflake ID is a single 64-bit integer divided into four fields. Read left-to-right (most significant bit first):

| 1 bit | 41 bits          | 10 bits     | 12 bits         |
| sign  | timestamp (ms)   | machine ID  | sequence per ms |
|   0   | since epoch      | 0–1023      | 0–4095          |

The sign bit is always zero — keeping the ID positive lets it slot into a Java long or a Postgres bigint without sign-extension surprises. The timestamp field counts milliseconds since a custom epoch (Twitter chose 2010-11-04T01:42:54.657Z — the moment the format went live). The machine ID is the generator's identity, baked into config at deploy time. The sequence is a per-millisecond counter that resets to zero each time the timestamp advances.

Why those exact widths? 41 bits buys 69.7 years from the epoch — long enough that no one has to migrate format mid-decade. 10 bits = 1024 generators, comfortable for a fleet of dozens or low hundreds with room to grow. 12 bits = 4096 IDs per millisecond, which works out to 4 million per second per machine — well past any real per-node throughput. Adjust the widths and you get the variants Discord, Instagram, and Sony use.

Why not just use UUIDv4?

UUIDv4 is 128 bits of random entropy. Two huge wins: trivially decentralized (any process can mint one) and effectively collision-free (the birthday bound at 2^61 IDs is comfortable). The catch is database indexes.

A B-tree index over UUIDv4 sees insertions land at random positions. Every insert dirties a different page; the index's working set grows to the entire index. Postgres benchmarks routinely show UUIDv4 insertions running 2–5× slower than monotonically-increasing IDs once the index exceeds RAM. Snowflake IDs solve this: because the timestamp occupies the high bits, new IDs always insert to the right side of the index — append-only, cache-hot, page-split-free.

The downside is coordination. Each generator needs a unique machine ID and a roughly-synchronized clock. UUIDv4 needs neither.

Snowflake variants in the wild

  • Twitter (original). 41/10/12. Custom epoch 2010-11-04. Open-sourced in Scala, retired in 2019 when Twitter moved to a different scheme.
  • Discord. 41/10/12 with epoch 2015-01-01. The IDs in every Discord URL — /channels/<guild>/<channel> — are snowflakes.
  • Instagram. 41/13/10 — 13-bit shard ID (8192 shards), 10-bit sequence (1024/ms). Designed to encode their MySQL shard topology directly into the ID.
  • Sonyflake. 39/8/16 — 39 bits of 10ms-precision timestamp (174 years), 8-bit machine ID (256 hosts), 16-bit sequence. Trades resolution for lifespan and burst capacity.
  • Snowflake-style ULID. 128 bits — 48 bits ms timestamp, 80 bits random. Same time-sortability, no machine-ID coordination, twice the storage.
  • UUIDv7 (RFC 9562, 2024). 128 bits — 48 bits Unix ms, 12 bits sub-millisecond entropy, 62 bits random. The IETF-standard answer to "I want snowflake but in UUID format."

When snowflake is the right choice

  • You need IDs faster than a central database can mint them. A MySQL auto-increment column is one row-lock per insert; snowflake is local arithmetic at four million per second per node.
  • Your IDs land in a clustered B-tree index. Time-sortable IDs keep insert latency flat as the index grows; random IDs degrade as RAM pressure rises.
  • You want time-encoded debugging. Decoding a snowflake gives you the creation timestamp for free — no separate created_at column needed for ordering.
  • You can pre-allocate machine IDs. Hard-coding a generator ID per host (or per Kubernetes pod via downward-API) is straightforward; ZooKeeper-based dynamic assignment is the alternative for elastic clusters.

Snowflake is the wrong choice when you need IDs that don't leak creation time (replace with UUIDv4 or unguessable random), when your fleet has more than ~1024 generators without ID-space partitioning, or when generators come and go fast enough that machine-ID assignment becomes a problem.

Snowflake vs UUID variants

SnowflakeUUIDv4UUIDv7ULIDKSUIDAuto-increment
Size64 bits128 bits128 bits128 bits160 bits32–64 bits
Time-sortableYes (ms)NoYes (ms)Yes (ms)Yes (s)Yes
DecentralizedYes (with machine ID)YesYesYesYesNo (single counter)
Embeds machine IDYes (10 bits)NoNoNoNoNo
Encodes creation timeYes (ms)NoYes (ms)Yes (ms)Yes (s)No
Coordination neededMachine ID assignmentNoneNoneNoneNoneCentral source
Per-generator throughput4M/secUnlimitedUnlimited (sub-ms ctr)UnlimitedUnlimitedCentral limit

The right pick depends on what tradeoff you can afford. Snowflake's small footprint and embedded machine ID make it ideal for high-volume, well-controlled fleets. UUIDv7 is the modern default for general-purpose decentralized IDs because the size penalty (8 extra bytes) is rarely the bottleneck and the standards body has blessed it.

Pseudo-code

// Constants
EPOCH        = 1288834974657   // 2010-11-04T01:42:54.657Z in Unix ms
MACHINE_BITS = 10              // 1024 machines
SEQ_BITS     = 12              // 4096 IDs per ms per machine

MAX_MACHINE = (1 << MACHINE_BITS) - 1   // 1023
MAX_SEQ     = (1 << SEQ_BITS) - 1       // 4095

TIMESTAMP_SHIFT = MACHINE_BITS + SEQ_BITS  // 22
MACHINE_SHIFT   = SEQ_BITS                 // 12

// Generator state
machineId  = (assigned at boot, 0–1023)
lastMs     = 0
sequence   = 0

generate():
    now = current_unix_ms()
    if now < lastMs:
        panic("clock moved backward")
    if now == lastMs:
        sequence = (sequence + 1) & MAX_SEQ
        if sequence == 0:
            // exhausted this ms — spin until the next ms
            while now <= lastMs:
                now = current_unix_ms()
    else:
        sequence = 0
    lastMs = now

    id = ((now - EPOCH) << TIMESTAMP_SHIFT)
       | (machineId << MACHINE_SHIFT)
       | sequence
    return id

JavaScript implementation

// Uses BigInt because JS Number can't precisely represent 64-bit values.

class SnowflakeGenerator {
  constructor(machineId, epochMs = 1288834974657n) {
    if (machineId < 0 || machineId > 1023) throw new Error('machineId 0-1023');
    this.machineId = BigInt(machineId);
    this.epoch = epochMs;
    this.lastMs = 0n;
    this.seq = 0n;
  }

  nextId() {
    let now = BigInt(Date.now());
    if (now < this.lastMs) {
      throw new Error('clock moved backward: refusing to mint');
    }
    if (now === this.lastMs) {
      this.seq = (this.seq + 1n) & 0xfffn;
      if (this.seq === 0n) {
        // sequence exhausted — spin to next ms
        while (BigInt(Date.now()) <= this.lastMs) {}
        now = BigInt(Date.now());
      }
    } else {
      this.seq = 0n;
    }
    this.lastMs = now;

    return ((now - this.epoch) << 22n)
         | (this.machineId << 12n)
         | this.seq;
  }

  static decode(id, epochMs = 1288834974657n) {
    const bi = BigInt(id);
    return {
      timestamp: Number((bi >> 22n) + epochMs),
      machineId: Number((bi >> 12n) & 0x3ffn),
      sequence:  Number(bi & 0xfffn),
    };
  }
}

// Usage
const gen = new SnowflakeGenerator(42);
const id = gen.nextId();             // e.g. 1729012345678901234n
SnowflakeGenerator.decode(id);
// → { timestamp: 1734567890123, machineId: 42, sequence: 0 }

Common pitfalls

  • Returning the ID as a JSON number. JavaScript's Number is IEEE 754 double — only 53 bits of integer precision. A 64-bit snowflake silently rounds when parsed by JSON.parse. Always serialize snowflakes as strings ("1729012345678901234") on the wire.
  • Reusing machine IDs. Two generators with the same machine ID can collide in the same millisecond. Pin the machine ID via deployment config (Kubernetes downward API for pod-ordinal), ZooKeeper ephemeral nodes, or environment variable injected by your scheduler.
  • Ignoring clock skew. NTP can step the clock forward (no big deal — sequence resets) or backward (catastrophic — duplicate IDs possible). Always check now >= lastMs and refuse to mint until the wall clock catches up.
  • Leap seconds. Pre-2017, some kernels could repeat a second during a leap. Use a leap-smearing NTP source (Google Public NTP, AWS Time Sync) to avoid the issue entirely.
  • Confusing snowflake bit ordering across language implementations. Some libraries put sequence in the high bits (anti-pattern — breaks sortability). Always verify the layout by decoding a known timestamp.

Performance characteristics

  • ~50 ns per ID on modern x86. The hot path is one atomic read of monotonic time, one masking-and-shifting expression, and a 64-bit store. Zero allocations, zero network round-trips.
  • 4096 IDs per ms hard cap per generator. Twelve bits of sequence space. Exceeded? Spin until the next ms, or widen the sequence field at the cost of timestamp resolution.
  • 69.7 years from epoch. Twitter's snowflakes run out around 2080-09-07. Discord's around 2084. Plan a format migration before you hit the wall.
  • ~50 bytes per ID at rest in JSON. Compared to UUID's ~36-char hex string, snowflake saves bandwidth and storage at scale — Twitter shaved petabytes off their tweet-ID column.
  • O(1) decoding. Splitting a snowflake into (timestamp, machine, sequence) is three masking operations — no parsing, no allocation, sub-nanosecond on modern CPUs.

Frequently asked questions

What is a Twitter snowflake ID?

A Twitter snowflake is a 64-bit integer that uniquely identifies a record without coordinating with any central service. The bit layout is: 1 sign bit (unused, always 0), 41 bits of timestamp (milliseconds since the Twitter epoch 2010-11-04T01:42:54.657Z), 10 bits of machine ID (1024 distinct generators), and 12 bits of per-millisecond sequence (4096 IDs per ms per machine). Twitter open-sourced it in 2010 to retire MySQL auto-increment IDs that bottlenecked the tweet pipeline.

Why use snowflake instead of UUIDv4?

UUIDv4 is fully random — great for uniqueness, terrible for database indexes. Insertions land at random positions in a B-tree index, causing page splits and cache misses. A snowflake ID is roughly time-sortable because the timestamp occupies the high 41 bits, so new IDs append to the right side of the index. Append-mostly insertions keep the tree's working set small and the cache hot. Postgres benchmarks show 2-5x higher insert throughput for time-sortable IDs vs UUIDv4.

What is the snowflake epoch?

Twitter chose 2010-11-04T01:42:54.657Z as the snowflake epoch — the start of millisecond zero in the timestamp field. With 41 bits of timestamp, the format can encode roughly 69 years of milliseconds (2^41 / 1000 / 60 / 60 / 24 / 365 ≈ 69.7) so Twitter's snowflakes run out around 2080-09-07. Discord uses 2015-01-01T00:00:00.000Z as their epoch. Instagram chose 2011-08-15T00:00:00Z. Custom epochs let each company push their roll-over date further into the future.

How many IDs can one snowflake generator produce per second?

Up to 4096 IDs per millisecond per machine, which is 4,096,000 IDs per second per generator node — far more than any real workload needs. With 10 bits of machine ID, a cluster of up to 1024 generators can produce 4.1 billion IDs per second in aggregate without coordination. Past that, you either widen the machine ID field (some forks use 8 bits → 256 machines, 14 bits sequence → 16k/ms) or stripe the address space hierarchically.

What happens when the clock moves backward?

If the system clock jumps backward (NTP step, manual change, leap second handling), a naive snowflake generator could emit IDs with timestamps in the past — risking duplicates if the timestamp+sequence combination repeats an earlier one. Robust implementations track the highest emitted timestamp and refuse to generate IDs until wall clock catches back up, or extend the sequence into the prior millisecond. Twitter's reference implementation logs a warning and throws if wall_clock < last_emitted_ms.

What is ULID and how does it compare to snowflake?

ULID (Universally Unique Lexicographically Sortable Identifier, 2016) is a 128-bit alternative to snowflake. The high 48 bits are millisecond timestamp; the low 80 bits are cryptographically random. ULID gets you snowflake's time-sortability plus UUID-level uniqueness with no machine ID coordination required — any process can mint one. The cost is twice the storage and no built-in machine identity (which snowflake provides for debugging). KSUID (Segment 2017) and UUIDv7 (RFC 9562, 2024) are similar ideas.

Are snowflake IDs actually sortable?

Roughly. Within a single generator they are strictly increasing — the timestamp+sequence ordering guarantees it. Across generators they are only sortable to millisecond precision: two IDs minted on different machines in the same millisecond have machine-ID and sequence positions that can interleave arbitrarily. If you need a strict total order across all generators, use a single coordinator or a consensus log (Raft). If you need "newest first" to within a millisecond, snowflake's high-bit timestamp is exactly what you want.