Distributed Systems

Event Sourcing

Q: Why store events instead of state?

State loses history. A row UPDATE overwrites the previous balance and the reason for the change is gone. Events preserve every change with intent: 'CustomerDeposited $100,' 'CustomerWithdrew $30.' Five killer benefits: (1) full audit trail by construction, no separate audit log; (2) time-travel — derive any past state by replaying to a specific event; (3) multiple read models from one event log (search index, cache, dashboard); (4) easy debugging — replay events into a dev environment to reproduce production bugs; (5) natural fit for Kafka, EventBridge, Kinesis. The cost is added complexity: you build projections instead of just running SELECT.

Q: What is CQRS and how does it pair with ES?

CQRS (Command Query Responsibility Segregation) splits the model: the write side accepts commands, validates business rules, and emits events. The read side consumes events and builds materialized views optimized for queries. ES + CQRS is a natural pair — events are the integration layer. The same OrderPlaced event powers the orders-by-customer view, the daily-revenue dashboard, the search index, and the recommendation feature, each as an independent projection. Read models can use different storage (Postgres for OLTP, Elasticsearch for search, Redis for cache) and update independently. Greg Young popularized both ES and CQRS together in 2010.

Q: How do snapshots accelerate replay?

Replaying 10,000 events to rebuild one aggregate is slow. Snapshots periodically save the current state of an aggregate so replay only needs the events since the last snapshot. Common policy: snapshot every 100 events, or every 24 hours. Cost: extra storage (~1KB per snapshot per aggregate × snapshot count) and write overhead. Benefit: cold load drops from O(n) events to O(snapshot interval). Eventstore, Axon Framework, and Marten implement snapshotting natively. Snapshots are themselves derived data — you can rebuild them by replaying from the start.

Q: How do you handle event schema evolution?

Events are immutable but business needs change. Five strategies: (1) Add new fields with defaults — old events carry implicit defaults on read; (2) Versioned event types: OrderPlaced_v1 → OrderPlaced_v2 with explicit migration code; (3) Upcasters: a function that transforms v1 events into v2 events at read time (Axon's approach); (4) Copy-and-replace: write a new event log from a transformation of the old (rare, high cost); (5) Weak-schema events (JSON with optional fields, validated at projection time). Most production systems use upcasters because they keep the historical log intact while letting projections work with current shapes.

Q: Why is Kafka a natural backend for event sourcing?

Kafka's primitive is exactly what ES needs — an append-only, partitioned, durable, replicated log with consumer offsets. Events get partitioned by aggregate ID (one partition per customer, order, account) preserving per-aggregate order. Consumers read at their own pace, replay from offset 0 to rebuild views, and Kafka retains data indefinitely if compaction is disabled. Latencies: ~10ms append, ~5ms tail read; throughput up to 1M events/sec per cluster. Confluent and Eventuate Tram explicitly market Kafka-as-event-store. Caveat: Kafka isn't a transactional database — you still need application-level idempotency and exactly-once semantics on consumers.

Q: When is event sourcing overkill?

Simple CRUD apps with no audit requirement, no temporal queries, and no high-throughput projections do not benefit. The 3-5x complexity tax (events + projections + snapshots + schema evolution) is wasted. Use ES when at least two are true: (1) regulators or auditors require complete history; (2) you need time-travel ('what was inventory at midnight'); (3) multiple specialized read models; (4) high-frequency writes that benefit from append-only storage; (5) Kafka or similar log infrastructure already in place; (6) domain is event-driven naturally (banking, IoT, gaming, ride-sharing). For a typical CRUD admin tool, plain Postgres is the right answer.

Don't store state — store events; replay them to reconstruct any version, anytime

Event sourcing is an architectural pattern where the source of truth is an append-only log of immutable events (e.g., OrderPlaced, PaymentReceived, OrderShipped), not a mutable state table. Current state is derived by replaying events from the beginning, optionally accelerated by snapshots. Provides full audit history, time-travel debugging, multiple specialized read models (CQRS), and natural integration with Kafka. Foundational in domain-driven design (Eric Evans, 2003; Greg Young popularized event sourcing 2010). Used in banking core systems (every debit/credit is an event), version control (Git is essentially event sourcing of file changes), and Walmart's transaction backbone.

Source of truthappend-only event log
Pattern partnerCQRS
Snapshotsperiodic state checkpoints
Versioningper-event schema evolution
Used inbanking, Git, Walmart, Kafka
Greg Young popularized2010

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why event sourcing matters

Audit logs by construction. Every state change is an event with timestamp and actor. No separate audit table; the log is the audit. Banks, healthcare, and trading systems mandate this.
Time travel. "What was inventory at 3pm yesterday?" Replay events up to that timestamp. Impossible with mutable state.
Multiple specialized read views. One log feeds an orders-by-customer view, daily-revenue dashboard, search index, recommendation features — each independent.
DDD aggregates. Domain-driven design treats aggregates as state machines that emit events on commands; ES persists exactly the events DDD already wants.
Bug reproduction. Copy production events to a dev environment; replay to reproduce any production bug deterministically.
Kafka and stream processing. Events are first-class citizens of Kafka, Flink, Materialize, ksqlDB — ES integrates natively.
Banking core systems. Wells Fargo, Capital One, and modern fintech (Monzo, Revolut) treat the ledger as an event log of debits and credits.
Version control. Git is event sourcing of files — commits are events; HEAD is a derived projection.

Anatomy of an event-sourced aggregate

An aggregate (DDD term for a consistency boundary) in ES has four parts:

Command handler. Receives a command (e.g., DepositMoney(account_id, amount)). Loads the aggregate. Validates business rules.
Event emission. If valid, emits one or more events: MoneyDeposited(account_id, amount, timestamp).
Event apply. Each event has an apply() method that updates in-memory aggregate state. Pure function: (state, event) → state.
Persist. Append events to the log atomically (must be all-or-nothing per command).

To load an aggregate: read all its events from the log, apply them in order to a zero state, return the final state. To make changes: handle command, emit events, append, apply.

A bank account aggregate might handle commands Deposit, Withdraw, Transfer. Events are MoneyDeposited, MoneyWithdrawn, MoneyTransferred. State is just the balance (plus invariants like is_frozen). Hot loading 10,000 events takes ~50-200ms; with snapshots every 100 events, it drops to under 5ms.

CQRS — separating writes and reads

CQRS (Command Query Responsibility Segregation) split the data model into write side (commands → events) and read side (events → projections). Greg Young popularized CQRS+ES together; the pair is symbiotic.

Write side: small (one aggregate per logical entity), strongly consistent (append events to log atomically), focused on validation. Pure business logic.

Read side: many denormalized views, each optimized for one query pattern. Eventually consistent (lag from event emission to projection update measured in milliseconds). One event might update 5+ projections.

Example projections from an orders event log:

orders_by_customer (Postgres): index on customer_id for "show my orders" queries.
daily_revenue (Postgres): aggregate of total revenue per day for executive dashboards.
orders_search (Elasticsearch): full-text search over order line items, status, customer name.
orders_cache (Redis): hot order summary for the cart-page widget.
recommendation_features (feature store): "customers who bought X also bought Y" derived signals.

Each projection consumer is independent — adding a new projection means writing one consumer that replays events from offset 0. No DB migrations on the write side.

Snapshots

Replaying 10,000 events to rebuild one aggregate is too slow for hot paths. Snapshots cache the current state at periodic intervals.

Common snapshot policy: every 100 events, or every 24 hours, or whichever comes first. On load: read latest snapshot, apply events since the snapshot's offset, return state.

Storage cost: each snapshot is ~1-10KB per aggregate × snapshot count. For 1M aggregates with snapshots every 100 events × ~10 snapshots retained = 10M snapshot rows × 5KB = 50GB. Manageable.

Snapshots are derived data — they can be regenerated by replay. If a snapshot becomes corrupted (rare) or the aggregate's apply() logic changes (more common), you regenerate from the log.

Eventstore, Axon Framework (Java), Marten (.NET on Postgres), and Eventuate Tram all implement snapshotting natively. Custom implementations layer this on top of Kafka or Postgres.

Schema evolution

Events are immutable; the world is not. After 5 years you'll have OrderPlaced events from an early schema with missing fields, fields renamed, types changed. Five strategies:

Add fields with defaults. A new field added in v2 reads as the default value when projecting v1 events. Cheapest; works for most evolution.
Versioned event types. OrderPlaced_v1 and OrderPlaced_v2 are distinct types; consumers handle both. Cumbersome but explicit.
Upcasters. A function that reads v1 from the log and presents it as v2 to projections. Axon Framework's approach. Keeps the log intact while consumers see one shape.
Copy-and-replace. Build a new log by transforming the old; switch over. High cost; only used when fundamental structure changes (e.g., merging two event streams).
Weak-schema (JSON with optional fields). Skip strict schemas; validate at projection time. Flexible but pushes validation into many consumers.

Most production systems use upcasters — they keep the historical log intact while projections work with current shapes. Schema-versioning frameworks (Avro, Protocol Buffers, JSON Schema with versioning) help; pure JSON forces hand-rolled migration.

Kafka as event store

Kafka's primitives — append-only, partitioned, durable, replicated, consumer offsets — match ES exactly. The pattern:

One topic per aggregate type (orders, payments, accounts) or one per logical stream.
Partition by aggregate ID so all events for one entity land in the same partition (preserves order).
Compaction off (or compaction with retention) if you need full history; with compaction, only the latest event per key is retained.
Consumers are projection builders, each tracking its own offset.
Replay by resetting a consumer's offset to 0.

Performance: Kafka can sustain ~1M events/sec per cluster on commodity hardware, p99 append latency <10ms, p99 read latency <5ms. Retention is configurable (default 7 days; ES typically configures forever).

Caveats: Kafka isn't a transactional database. To prevent duplicate events on producer retries, use Kafka's transactional API or rely on application-level idempotency keys. For exactly-once consumer semantics, combine Kafka transactions with idempotent projection writes (UPSERT, conditional INSERT).

Banking core systems

Modern banking treats the ledger as the canonical event log: every debit and credit is an event. Account balance is a projection — DBalanceAtT = sum of all debits and credits up to T.

Why ES wins for banking:

Regulatory. SOX, PCI, GDPR, MiFID II all mandate complete change history. ES gives this for free.
Reconciliation. Every dispute or audit replays the log from the disputed timestamp.
Multi-currency. The same debit event projects into USD-balance, EUR-balance, and base-currency dashboards by separate projection logic.
Backdated corrections. Insert a "correction" event with a different effective date; rerun affected projections.

Monzo's bank ledger is event-sourced (publicly described). Capital One's modernization replaced batch nightly-close with event-sourced real-time ledger. SQS/Kinesis at AWS, Pub/Sub at Google power similar patterns at scale.

Git as event sourcing

Git is event sourcing of file content. Each commit is an event with a parent hash, author, message, and tree (snapshot of files). HEAD is a projection — the current state derived by walking commits.

Branches are projections at different offsets. git checkout main rebuilds the working tree from the events along the main branch's commit graph. git revert emits a compensating event (a new commit that inverts an old one). git rebase rewrites events (controversial — ES purists would call this a copy-and-replace migration).

Snapshots: each commit stores a complete tree, which is a snapshot. Pack files compress the storage, but conceptually every commit knows the full state at that point. Git is faster than pure event replay because it doesn't need to compose deltas — it has snapshots at every event.

Common misconceptions

"ES is just a log." The log is necessary but the projections are where business value lives. Forgetting to invest in projection quality means you have an unusable history dump.
"ES is always good for auditing." Events may contain PII (customer names, card numbers). GDPR's right-to-erasure clashes with immutability — use crypto-shredding (encrypt PII, throw away the key) or pseudonymization at write time.
"State stays small." An aggregate that lives 10 years and emits 100 events/day = 365,000 events. Without snapshots, replay is slow. Always snapshot.
"Events are easy to design." Naming events well — past tense, business-relevant, granular but not too granular — is genuinely hard. Bad event taxonomy haunts projects for years.
"ES gives you free strong consistency." Within an aggregate, yes (the log is atomic). Across aggregates, no — projections lag, and your write side may need sagas to coordinate multi-aggregate operations.
"Snapshots are optional." For aggregates with more than ~1000 events, snapshots are mandatory for hot-path performance.
"You need Kafka." Postgres with an events table works fine for many systems; Marten and Eventuate run on Postgres. Kafka shines when you have many independent projection consumers and high write throughput.

Real-world numbers

Walmart inventory: ~1B events/day across stores, projected into 50+ specialized views.
Monzo ledger: ~10M events/day; account loads <50ms p99 with snapshots every 200 events.
Eventstore DB: 50,000 events/sec single-node, <5ms append latency; clustered scales to 1M+ events/sec.
Kafka as ES: 1M events/sec/cluster, retention indefinite, p99 read <5ms.
Axon Server: 10,000 events/sec/aggregate, 100K+ aggregates per server.

Frequently asked questions

Why store events instead of state?

State loses history. A row UPDATE overwrites the previous balance and the reason for the change is gone. Events preserve every change with intent: 'CustomerDeposited $100,' 'CustomerWithdrew $30.' Five killer benefits: (1) full audit trail by construction, no separate audit log; (2) time-travel — derive any past state by replaying to a specific event; (3) multiple read models from one event log (search index, cache, dashboard); (4) easy debugging — replay events into a dev environment to reproduce production bugs; (5) natural fit for Kafka, EventBridge, Kinesis. The cost is added complexity: you build projections instead of just running SELECT.

What is CQRS and how does it pair with ES?

CQRS (Command Query Responsibility Segregation) splits the model: the write side accepts commands, validates business rules, and emits events. The read side consumes events and builds materialized views optimized for queries. ES + CQRS is a natural pair — events are the integration layer. The same OrderPlaced event powers the orders-by-customer view, the daily-revenue dashboard, the search index, and the recommendation feature, each as an independent projection. Read models can use different storage (Postgres for OLTP, Elasticsearch for search, Redis for cache) and update independently. Greg Young popularized both ES and CQRS together in 2010.

How do snapshots accelerate replay?

Replaying 10,000 events to rebuild one aggregate is slow. Snapshots periodically save the current state of an aggregate so replay only needs the events since the last snapshot. Common policy: snapshot every 100 events, or every 24 hours. Cost: extra storage (~1KB per snapshot per aggregate × snapshot count) and write overhead. Benefit: cold load drops from O(n) events to O(snapshot interval). Eventstore, Axon Framework, and Marten implement snapshotting natively. Snapshots are themselves derived data — you can rebuild them by replaying from the start.

How do you handle event schema evolution?

Events are immutable but business needs change. Five strategies: (1) Add new fields with defaults — old events carry implicit defaults on read; (2) Versioned event types: OrderPlaced_v1 → OrderPlaced_v2 with explicit migration code; (3) Upcasters: a function that transforms v1 events into v2 events at read time (Axon's approach); (4) Copy-and-replace: write a new event log from a transformation of the old (rare, high cost); (5) Weak-schema events (JSON with optional fields, validated at projection time). Most production systems use upcasters because they keep the historical log intact while letting projections work with current shapes.

Why is Kafka a natural backend for event sourcing?

Kafka's primitive is exactly what ES needs — an append-only, partitioned, durable, replicated log with consumer offsets. Events get partitioned by aggregate ID (one partition per customer, order, account) preserving per-aggregate order. Consumers read at their own pace, replay from offset 0 to rebuild views, and Kafka retains data indefinitely if compaction is disabled. Latencies: ~10ms append, ~5ms tail read; throughput up to 1M events/sec per cluster. Confluent and Eventuate Tram explicitly market Kafka-as-event-store. Caveat: Kafka isn't a transactional database — you still need application-level idempotency and exactly-once semantics on consumers.

When is event sourcing overkill?

Simple CRUD apps with no audit requirement, no temporal queries, and no high-throughput projections do not benefit. The 3-5x complexity tax (events + projections + snapshots + schema evolution) is wasted. Use ES when at least two are true: (1) regulators or auditors require complete history; (2) you need time-travel ('what was inventory at midnight'); (3) multiple specialized read models; (4) high-frequency writes that benefit from append-only storage; (5) Kafka or similar log infrastructure already in place; (6) domain is event-driven naturally (banking, IoT, gaming, ride-sharing). For a typical CRUD admin tool, plain Postgres is the right answer.