Question 1

Why is 2PC blocking?

Accepted Answer

Once a participant has voted yes in phase 1 and durably logged its prepared state, it cannot unilaterally abort or commit — only the coordinator's phase-2 decision can resolve it. If the coordinator crashes after collecting some yes votes but before sending commit/abort, every prepared participant must hold its locks and wait, possibly forever, until the coordinator recovers. Surviving participants can ask each other (cooperative termination) but if any prepared peer is also unreachable, they cannot decide unilaterally without risking inconsistency. This is the canonical failure mode of 2PC and the reason 'blocking' is the protocol's defining flaw.

Question 2

What is the prepare phase actually doing?

Accepted Answer

Prepare is a promise to be able to commit. When a participant receives prepare, it: (1) acquires all locks needed for the transaction's writes; (2) writes the redo and undo log entries to durable storage with fsync; (3) writes a 'prepared' record naming the transaction and the coordinator; only then does it reply yes. After this point, the participant must be able to either commit or abort on demand even after a crash — the prepared record will be replayed during recovery. The cost is real: each prepare incurs a forced log write (1-2 ms on spinning disk, 50-200 µs on NVMe), times every participant. Saying yes is a contract.

Question 3

How does 3PC fix the blocking problem?

Accepted Answer

Three-phase commit (Skeen, 1982) inserts a pre-commit phase between prepare and commit. After all participants vote yes, the coordinator sends pre-commit. Each participant acknowledges, then waits for the final commit. The added phase ensures that if the coordinator fails after enough participants have entered pre-commit, the survivors can elect a new coordinator and decide based on whether any peer has reached pre-commit (in which case all survivors commit) or none have (all abort). 3PC is non-blocking under fail-stop assumptions but vulnerable to network partitions. It also doubles the message rounds (6n vs 3n for 2PC). In practice, real systems prefer Paxos Commit (Gray and Lamport 2006) which uses Paxos to make the coordinator role itself fault-tolerant.

Question 4

What is XA in JTA / java.transaction?

Accepted Answer

XA is the X/Open standard from 1991 that defines the interface between a transaction manager (TM) and a resource manager (RM, e.g. a database or message queue). It encodes the 2PC protocol as a set of C functions: xa_prepare, xa_commit, xa_rollback, xa_recover. The TM acts as the coordinator; each XA-compliant RM is a participant. In Java this surfaces as java.transaction.UserTransaction (JTA), with implementations in JBoss/Wildfly, Atomikos, Bitronix, and Oracle WebLogic. XA is the reason your enterprise app can put a JMS send and a JDBC update in the same transaction. It is also why those apps can hang in 'in-doubt transaction' state when a coordinator dies — recovering them by hand from RM logs is a real operational chore.

Question 5

How does Spanner combine 2PC and Paxos?

Accepted Answer

Google Spanner gets the best of both: 2PC for atomicity across shards, Paxos for fault tolerance of each shard's state. Each shard is a Paxos group (typically 5 replicas across data centers) that internally agrees on its own log; one replica is the leader. For a multi-shard transaction, Spanner picks one shard's Paxos group as the 2PC coordinator and the others as participants. The coordinator's prepared state is replicated through Paxos, so if the leader crashes, a new leader can resume the 2PC decision — the protocol no longer blocks on a single coordinator. TrueTime gives global timestamp ordering on top. The cost is real: a multi-shard write is a 2PC over Paxos, taking tens of milliseconds round-trip.

Question 6

Why is 2PC discouraged in microservices (saga pattern)?

Accepted Answer

2PC requires every participant to expose a prepare/commit/abort protocol, durable logging, and to hold locks across phase boundaries. That is fine for two databases under a transaction manager but doesn't compose for HTTP-based microservices: services are owned by different teams, may have private storage technologies (some without prepared-state semantics), can be slow or temporarily down, and locking across them holds resources for human-perceptible durations. The saga pattern replaces atomicity with eventual consistency: each step is a local transaction, with a compensating action defined for each forward step (debit-account → refund-account). On failure, you run the compensations backward. You lose isolation but gain liveness and operational simplicity.

Two-Phase Commit (2PC)

Interactive visualization

Watch the 60-second explainer

Why 2PC matters

The protocol step by step

Failure modes and recovery

2PC vs Paxos / Raft

Common misconceptions

XA in 30 lines, conceptually

Frequently asked questions