Databases

Snapshot Isolation

Every transaction reads a frozen world — until two of them quietly disagree

Snapshot isolation gives each transaction a frozen, consistent view of the database as of the moment it began, so readers never block writers — but it permits one anomaly, write skew, that serializable isolation forbids.

  • ReadsLock-free, from a snapshot
  • Write-write conflictFirst-committer-wins
  • PreventsDirty / non-repeatable reads, lost updates
  • AllowsWrite skew
  • Built onMVCC

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How snapshot isolation works

When a transaction begins under snapshot isolation (SI), the database hands it a logical photograph of the entire database — every committed value, exactly as it stood at that instant. The transaction reads from that photograph for its whole lifetime. Other transactions can commit hundreds of changes underneath it; the snapshot doesn't move. Two reads of the same row, ten seconds apart, return the same value. That stability is the whole point.

The mechanism is multi-version concurrency control. Instead of overwriting a row in place, every UPDATE writes a brand new version of the row, tagged with the transaction ID that created it (xmin) and, once superseded, the ID that retired it (xmax). A transaction with start timestamp T reads the newest version whose creator committed before T and whose retirement happened after T (or hasn't happened at all). Because the reader walks an immutable version chain, it never takes a shared lock and never waits on a writer.

Writes are where SI earns its keep. When two transactions try to modify the same row, the database enforces first-committer-wins: whichever commits first succeeds, and the other gets a serialization error the moment it tries to commit. This single rule is what makes SI immune to the lost-update anomaly that READ COMMITTED suffers from. The catch — and the entire reason this concept is interesting — is that the rule only fires on the same row. Two transactions writing different rows never collide, even when their reads overlapped and their decisions were entangled. That gap is write skew.

The visibility rule, precisely

SI is defined by two timestamps per transaction and one visibility predicate. Let start(Ti) be the snapshot timestamp and commit(Ti) the commit timestamp. A row version created by transaction Tj is visible to Ti if and only if:

visible(version_by_Tj, to=Ti) ⇔
    Tj committed   AND
    commit(Tj) < start(Ti)   AND
    Tj is not in Ti's "in-flight at snapshot" set

The last clause matters: the snapshot isn't just "every commit before timestamp T." It is the set of transactions that had already committed at the instant the snapshot was taken. Any transaction still running when Ti started is invisible to Ti forever, even if it commits a microsecond later. PostgreSQL records this as a snapshot triple (xmin, xmax, xip_list) — the smallest still-active ID, the next ID to be assigned, and the list of in-progress IDs in between.

The write rule is the dual: at commit time, transaction Ti aborts if any row it modified was also modified and committed by some Tj with start(Ti) < commit(Tj) < commit(Ti). That's a write-write intersection check, and it is the only conflict SI tests. Read-write and write-read dependencies — the ones serializability cares about — are never inspected. That asymmetry is exactly why SI is cheaper than serializable, and exactly why it's not serializable.

When to choose snapshot isolation

  • Read-heavy OLTP and reporting — long analytical reads run against a stable snapshot without blocking the OLTP writers hammering the same tables.
  • Mixed workloads where read latency is sacred — dashboards, APIs, and "show me my account" reads never queue behind a write lock.
  • Workloads whose invariants live inside single rows — if every consistency rule can be expressed as "this one row's value must satisfy X," first-committer-wins already protects you, and you get serializable-equivalent behavior for free.
  • When you'd otherwise reach for READ COMMITTED but can't tolerate lost updates — SI is the cheapest level that closes the lost-update hole.

Avoid leaning on plain SI when correctness depends on a multi-row invariant — "the sum of these balances stays non-negative," "at least one of these rows is true," "no two of these overlap." Those are write-skew traps. Either upgrade to Serializable Snapshot Isolation or materialize the conflict (see below).

Snapshot isolation vs other isolation levels

READ COMMITTEDREPEATABLE READ (ANSI)Snapshot IsolationSerializable (SSI)
Dirty readPreventedPreventedPreventedPrevented
Non-repeatable readAllowedPreventedPreventedPrevented
Phantom readAllowedAllowed (ANSI)PreventedPrevented
Lost updateAllowedPrevented (with locks)Prevented (first-committer-wins)Prevented
Write skewAllowedAllowedAllowedPrevented
Reads block writers?No (MVCC) / Yes (locks)Often yesNoNo (tracks dependencies)
Conflict detectionNone on readsRead locksWrite-write onlyRead-write dependency graph
Cost over SICheaperSimilarBaseline+5–20% abort/tracking overhead

The subtle trap is the second column. ANSI's REPEATABLE READ and "snapshot isolation" are different guarantees that vendors freely conflate. ANSI RR is defined by which phenomena it forbids and famously still permits phantoms; SI forbids phantoms (your snapshot simply never sees rows committed after it started) but permits write skew, which the ANSI table doesn't even name. PostgreSQL's REPEATABLE READ level is in fact snapshot isolation. Berenson et al.'s 1995 paper A Critique of ANSI SQL Isolation Levels coined "snapshot isolation" precisely to expose this mess.

What the numbers actually say

  • Write skew is not rare. The canonical Cahill–Röhm–Fekete SIGMOD 2008 paper measured a TPC-C++ benchmark where, under plain SI, integrity violations accumulated continuously; their Serializable SI implementation eliminated them while staying within roughly 10% of SI throughput.
  • SSI's overhead is modest. PostgreSQL's implementation of Serializable Snapshot Isolation (shipped in 9.1, 2011) adds predicate locks and a read-write dependency tracker; published benchmarks put the throughput cost at single-digit-to-low-double-digit percent for most OLTP, with the price paid in transaction aborts rather than blocking.
  • Aborts, not waits, are the failure mode. Under SI and SSI, a conflicting transaction doesn't deadlock-and-wait; it fails fast with ERROR: could not serialize access (SQLSTATE 40001). The application must retry. A retry loop that gives up after, say, 3 attempts is standard.
  • Snapshot age has a storage cost. A transaction that holds a snapshot open for an hour pins every row version created in that hour from being vacuumed. On a high-write Postgres table this is the classic "long-running transaction → table bloat" incident; a single forgotten BEGIN can stall vacuum across the whole database.

JavaScript: write skew demonstrated

Here is a tiny in-memory MVCC store that implements snapshot reads and first-committer-wins, then reproduces the doctors-on-call write skew. Each row keeps a version list; a transaction reads the newest version committed before its start timestamp.

let clock = 0;
const nextTs = () => ++clock;

class Store {
  constructor() { this.rows = new Map(); }   // key -> [{value, commitTs}]
  put(key, value, commitTs) {
    const chain = this.rows.get(key) || [];
    chain.push({ value, commitTs });
    this.rows.set(key, chain);
  }
  // newest version committed strictly before the snapshot
  read(key, snapshotTs) {
    const chain = this.rows.get(key) || [];
    let best = null;
    for (const v of chain)
      if (v.commitTs < snapshotTs && (!best || v.commitTs > best.commitTs)) best = v;
    return best ? best.value : null;
  }
}

class Txn {
  constructor(store) { this.store = store; this.start = nextTs(); this.writes = new Map(); }
  read(key)  { return this.writes.has(key) ? this.writes.get(key) : this.store.read(key, this.start); }
  write(key, value) { this.writes.set(key, value); }

  commit() {
    // first-committer-wins: abort if any written row got a newer committed version
    for (const key of this.writes.keys()) {
      const chain = this.store.rows.get(key) || [];
      if (chain.some(v => v.commitTs > this.start))
        throw new Error(`serialization failure on "${key}"`);  // SQLSTATE 40001
    }
    const ts = nextTs();
    for (const [key, value] of this.writes) this.store.put(key, value, ts);
    return ts;
  }
}

// Rule: at least one doctor must stay on call.
const db = new Store();
db.put('alice_oncall', true, nextTs());
db.put('bob_oncall',   true, nextTs());

const t1 = new Txn(db);   // Alice wants to go off call
const t2 = new Txn(db);   // Bob wants to go off call, concurrently

// Each checks the OTHER row in its own snapshot — both see a second doctor.
if ([t1.read('alice_oncall'), t1.read('bob_oncall')].filter(Boolean).length >= 2)
  t1.write('alice_oncall', false);
if ([t2.read('alice_oncall'), t2.read('bob_oncall')].filter(Boolean).length >= 2)
  t2.write('bob_oncall', false);

t1.commit();   // succeeds — wrote alice_oncall
t2.commit();   // ALSO succeeds — wrote bob_oncall, a DIFFERENT row

console.log(db.read('alice_oncall', nextTs()), db.read('bob_oncall', nextTs()));
// → false false   ❌ invariant broken: nobody is on call

Both commits succeed because first-committer-wins only checks the row each transaction wrote, and they wrote disjoint rows. The invariant spanned two rows, so SI never saw the conflict. Under true serializability, one of these must abort.

Python: the same store, plus the fix

The same model in Python, with a one-line fix that defeats write skew on plain SI: have both transactions touch a shared sentinel row (a materialized conflict), so first-committer-wins fires.

clock = 0
def next_ts():
    global clock; clock += 1; return clock

class Store:
    def __init__(self): self.rows = {}          # key -> list[(value, commit_ts)]
    def put(self, key, value, commit_ts):
        self.rows.setdefault(key, []).append((value, commit_ts))
    def read(self, key, snapshot_ts):
        best = None
        for value, ts in self.rows.get(key, []):
            if ts < snapshot_ts and (best is None or ts > best[1]):
                best = (value, ts)
        return best[0] if best else None

class Txn:
    def __init__(self, store):
        self.store, self.start, self.writes = store, next_ts(), {}
    def read(self, key):
        return self.writes[key] if key in self.writes else self.store.read(key, self.start)
    def write(self, key, value): self.writes[key] = value
    def commit(self):
        for key in self.writes:                  # first-committer-wins
            if any(ts > self.start for _, ts in self.store.rows.get(key, [])):
                raise RuntimeError(f"serialization failure on {key!r}")  # 40001
        ts = next_ts()
        for key, value in self.writes.items():
            self.store.put(key, value, ts)
        return ts

# FIX: both transactions also bump a shared "oncall_count" row.
db = Store()
db.put("oncall_count", 2, next_ts())

t1, t2 = Txn(db), Txn(db)
if t1.read("oncall_count") >= 2: t1.write("oncall_count", t1.read("oncall_count") - 1)
if t2.read("oncall_count") >= 2: t2.write("oncall_count", t2.read("oncall_count") - 1)

t1.commit()                       # succeeds
try:
    t2.commit()                   # both wrote the SAME row now
except RuntimeError as e:
    print("aborted:", e)          # → aborted: serialization failure on 'oncall_count'
print("on call:", db.read("oncall_count", next_ts()))   # → 1  ✅ invariant held

The pseudocode for that fix is one sentence: if your invariant spans multiple rows, force every transaction that participates in the invariant to write a common row, so the write-write check can collapse two logical conflicts into one physical one. That is exactly what SELECT ... FOR UPDATE on the read set does under the hood.

Variants worth knowing

Serializable Snapshot Isolation (SSI). Cahill, Röhm, and Fekete's 2008 design adds runtime tracking of read-write dependencies. It looks for a "dangerous structure" — two consecutive rw-antidependency edges between three transactions — which is provably necessary for any non-serializable SI execution, and aborts a participant. It is what PostgreSQL 9.1+ runs at SERIALIZABLE, achieving true serializability while keeping SI's lock-free reads.

Write-snapshot / "Serializable SI" with read-set validation. Instead of tracking dependencies, validate at commit that no row the transaction read was overwritten by a transaction that committed during its lifetime. This catches write skew directly but can over-abort (it flags benign read-write overlaps too).

Read Committed Snapshot Isolation (RCSI). SQL Server's READ_COMMITTED_SNAPSHOT takes a fresh snapshot at the start of each statement rather than each transaction. It removes read locks like SI but, being statement-scoped, still allows non-repeatable reads. It's a popular default precisely because it's cheaper than full SI.

External consistency / Spanner's commit-wait. Google Spanner layers SI on globally synchronized TrueTime clocks so snapshots are consistent across data centers, and adds commit-wait to make the system externally (linearizably) serializable across the whole planet.

Common bugs and edge cases

  • Assuming SERIALIZABLE means serializable. In Oracle, and in PostgreSQL before 9.1, asking for SERIALIZABLE gives you snapshot isolation — write skew and all. Read the manual for your exact version.
  • Multi-row invariants enforced by application checks. "Read the other rows, verify the rule, then write" is the write-skew recipe. The verification ran against a stale snapshot. Use FOR UPDATE, a materialized conflict, or SSI.
  • Forgetting to retry on 40001. SSI and first-committer-wins surface conflicts as commit-time errors, not blocking waits. Code that doesn't catch and retry serialization failures will leak errors to users under load.
  • Long-running snapshots starving vacuum. An open transaction pins old row versions. A stray BEGIN; SELECT 1; left open in a connection pool can bloat a busy table by gigabytes and tank performance database-wide.
  • Read-only transaction anomaly. Counter-intuitively, even a transaction that only reads can observe an SI anomaly when sandwiched between two read-write transactions (Fekete et al., 2004). "It's just a SELECT" is not a safety argument under SI.
  • Phantoms via predicate writes. SI prevents phantoms within a single snapshot read, but two transactions inserting rows that match each other's WHERE predicate is the insert-flavored cousin of write skew — same trap, same fixes.

Frequently asked questions

What anomalies does snapshot isolation prevent, and which one does it allow?

Snapshot isolation prevents dirty reads, non-repeatable reads, lost updates, and the phantom anomaly that plagues weaker levels. It still permits write skew: two transactions read an overlapping set of rows, each writes to a disjoint row, and the pair commits a state that violates an invariant neither broke alone.

Is snapshot isolation the same as the SERIALIZABLE isolation level?

No. Snapshot isolation is strictly weaker than true serializability. Because of write skew and read-only transaction anomalies, an SI schedule can produce an outcome that no serial ordering of the same transactions could. Oracle and PostgreSQL pre-9.1 actually run snapshot isolation when you ask for SERIALIZABLE; PostgreSQL 9.1+ closes the gap with Serializable Snapshot Isolation.

What is write skew in snapshot isolation?

Write skew happens when two concurrent transactions read the same data, then each updates a different row based on what they read. The classic case: two doctors are on call, a rule requires at least one, and both transactions check the rule, see the other doctor, and remove themselves — leaving zero doctors on call. First-committer-wins doesn't catch it because they write disjoint rows.

How does snapshot isolation handle write-write conflicts?

With first-committer-wins (or first-updater-wins). If two transactions update the same row, only the first to commit succeeds; the second aborts with a serialization failure when it tries to commit. This is what makes SI immune to lost updates, unlike plain READ COMMITTED.

Why are reads free of locks under snapshot isolation?

Each transaction reads from a multi-version snapshot frozen at its start timestamp. A concurrent writer creates a new row version with a higher transaction ID, but the reader ignores it, so the reader needs no shared locks and never blocks the writer. This is why SI is built on MVCC.

How do you fix write skew if you must stay on snapshot isolation?

Materialize the conflict so first-committer-wins can see it: add an explicit SELECT ... FOR UPDATE on the rows you read but logically depend on, write to a shared counter/summary row both transactions touch, or upgrade to Serializable Snapshot Isolation which detects the dangerous read-write dependency structure and aborts one transaction automatically.