Distributed Patterns
Bulk Insert Batching
One round-trip instead of a thousand — the universal high-write speedup
Bulk insert batching folds many single-row INSERTs into one multi-row statement, paying network and commit costs once. The result: 100-1000x throughput, with no schema changes.
- Speedup100-1000x typical
- Optimal batch500-10,000 rows
- Postgres COPY200k-500k rows/sec single-stream
- Parameter cap65,535 in Postgres extended protocol
- Used inKafka Connect, Logstash, Telegraf
- SavingTCP RTT + parse + WAL fsync
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Why batching wins so hard
A single-row INSERT to a remote Postgres looks like a one-line operation in your code. Inside the database, it pays a long list of costs that are constant per statement, not per row:
- TCP round-trip. 0.1 ms local, 1-5 ms cross-AZ, 30-100 ms cross-region. The client must wait for the server to acknowledge before sending the next row.
- Statement parsing and planning. Postgres lexes the SQL, builds an AST, validates types, picks a plan. Usually 0.1-0.3 ms cached, longer cold.
- Locking and constraint checks. Row lock acquisition, FK validation, unique index probes.
- WAL record + commit. Each transaction writes a commit record to the write-ahead log and fsyncs to disk. That fsync alone is 0.5-2 ms on SSD, 5-15 ms on spinning disk.
Add it up: a typical single-row INSERT over the network costs ~3-10 ms wall clock. That caps you at 100-300 inserts/sec/connection — embarrassingly slow on hardware that can do tens of thousands of rows/sec internally. The database isn't slow; you're paying the overhead a thousand times.
Batching collapses all of those costs to one. Same parser cost. Same plan. Same WAL fsync. The only thing scaling with row count is the actual work of writing the data — which is what you wanted in the first place.
The three batching shapes
You can batch at three different layers, each with different speedups:
- Multi-row INSERT. One SQL statement with many VALUES tuples:
INSERT INTO metrics (host, ts, v) VALUES ('a',1,2), ('b',3,4), ('c',5,6), .... Works everywhere. 25-100x speedup over per-row INSERTs. Hits parameter limits at thousands of rows wide. - Prepared statement batching. Driver-side: prepare INSERT once, send N parameter sets, server replays the plan for each. JDBC
PreparedStatement.addBatch()+executeBatch(). WithreWriteBatchedInserts=truePostgres JDBC rewrites into a single multi-row INSERT for you. - Bulk loaders (COPY, LOAD DATA). Stream raw rows in a tight binary or CSV format. No SQL parsing per row. Postgres
COPYsustains 200k-500k rows/sec single-stream; MySQLLOAD DATA INFILEis in the same ballpark. The catch: COPY is one big transaction and has limited per-row conflict handling.
A concrete worked example
You want to ingest 1,000,000 metric data points into Postgres on a 1 ms-latency network link. Three implementations:
- Per-row INSERT, autocommit. Each row pays 1 ms RTT + 0.2 ms parse + 1 ms fsync ≈ 2.2 ms. Total: 2200 seconds (≈37 minutes). Throughput: ~450 rows/sec.
- Per-row INSERT, single transaction. 1 ms RTT × 1,000,000 = 1000 sec, but fsync only happens once. Total: ~1000 seconds. Still dominated by network round-trips.
- 1000-row multi-row INSERT, in batches. 1000 batches × (1 ms RTT + 0.2 ms parse + 1 ms fsync + 1000 row-writes at ~1 µs) = 1000 × ~3 ms = 3 seconds. ~700x faster.
- COPY in one stream. No parse-per-row, no per-row commit. About 0.5-2 seconds for 1M rows. ~1000-4000x faster.
The astonishing leap from "per-row autocommit" to "COPY" is the lived experience of every team that retrofits batching onto an ingestion pipeline. The fastest fix in modern databases is almost always "stop sending one row at a time."
Picking a batch size
Batch too small and per-batch overhead (parse, plan, RTT) still dominates. Batch too large and you stack up problems: long transactions hold row locks, WAL grows, and a single failure forces you to redo a lot of work.
Rule of thumb by workload:
- OLTP with constraints. 500-2000 rows. Keeps transactions short so other writers aren't blocked.
- Time-series ingest. 5,000-50,000 rows. Time-series tables usually have no unique constraints to validate; the only cost scaling is data volume.
- ETL / batch load. 100,000+ rows or COPY the whole file. You don't care about lock duration; you care about wall clock.
Hard cap to remember: Postgres extended-query protocol limits a single statement to 65,535 parameters. A 20-column INSERT is therefore capped at ~3,200 rows per execute. Bypassing this needs string-concat INSERT or COPY.
COPY vs multi-row INSERT
Multi-row INSERT is the universal answer. COPY is the high-end speedup with one catch: it can't handle per-row UPSERTs or row-level error continuation. If a single row in your COPY stream violates a constraint, the whole COPY aborts.
The production trick: COPY into a staging table, then INSERT-SELECT into the destination with ON CONFLICT. You get COPY's raw speed (no parser per row) plus per-row conflict semantics:
BEGIN;
COPY metrics_staging FROM STDIN;
INSERT INTO metrics SELECT * FROM metrics_staging
ON CONFLICT (host, ts) DO UPDATE SET v = excluded.v;
TRUNCATE metrics_staging;
COMMIT;
This is the pattern Kafka Connect's JDBC sink, dbt, and most ETL frameworks use under the hood.
Pseudo-code: the production buffer-and-flush loop
// Buffer rows by time AND size; flush whichever trips first.
class BatchSink:
BATCH_SIZE = 5000
FLUSH_INTERVAL = 500ms
buffer = []
last_flush = now()
def insert(row):
buffer.append(row)
if len(buffer) >= BATCH_SIZE or now() - last_flush >= FLUSH_INTERVAL:
flush()
def flush():
if not buffer: return
BEGIN
multi_row_insert(buffer) // ONE statement, N rows
COMMIT
buffer.clear()
last_flush = now()
The two-trigger flush (size OR time) is the canonical pattern. Without the timer, low-volume periods never flush. Without the size cap, bursts blow memory.
Python: psycopg2 execute_values
import psycopg2
from psycopg2.extras import execute_values
conn = psycopg2.connect("postgres://...")
cur = conn.cursor()
rows = [(host, ts, value) for ... in stream] # e.g. 5000 rows
# ONE INSERT with 5000 VALUES tuples; one round-trip; one commit.
execute_values(
cur,
"INSERT INTO metrics (host, ts, v) VALUES %s "
"ON CONFLICT (host, ts) DO UPDATE SET v = excluded.v",
rows,
page_size=5000,
)
conn.commit()
On a local Postgres, this jumps from ~2,000 rows/sec (per-row INSERT) to ~50,000 rows/sec (batched) — a real measurement, not a marketing number. With COPY, the same machine hits ~250,000 rows/sec.
Batching strategies compared
| Per-row INSERT (autocommit) | Per-row INSERT (1 transaction) | Multi-row INSERT (batch) | COPY / LOAD DATA | |
|---|---|---|---|---|
| Throughput (1M rows) | ~450 rows/sec | ~1000 rows/sec | ~50,000 rows/sec | ~250,000 rows/sec |
| Network round-trips | N | N | N / batch_size | 1 (streaming) |
| WAL fsyncs | N | 1 | ~N / batch_size | 1 |
| Per-row conflict handling | Yes | Yes | Yes (ON CONFLICT) | No (abort on first error) |
| Lock duration | Per-row | Total runtime | Per-batch | Total runtime |
| Memory footprint | None | None | batch_size rows | None (streaming) |
| Code change required | Baseline | Minimal | Small | Moderate (driver-specific) |
Notice the middle column: just wrapping per-row INSERTs in one transaction doubles throughput thanks to a single fsync. That's the cheapest possible fix when you can't change the SQL shape.
Real-world deployments
- Kafka Connect JDBC Sink. Default
batch.size = 3000, configurable up to ~10,000. Sustains 50-200k rows/sec per worker on commodity Postgres. - Logstash JDBC output. Default
flush_size = 500,idle_flush_time = 1s. Buffer-and-flush from day one. - Telegraf for InfluxDB.
metric_batch_size = 1000default,flush_interval = 10s. Hits 500k metrics/sec/agent. - ClickHouse INSERT. Explicitly designed for bulk loads. Recommended batch ≥ 1,000 rows; smaller batches are penalized with insert blocks. Documented throughput: 100MB/sec/node.
- Snowflake COPY INTO. Bulk loads from S3 staged files; 100GB / minute on a Medium warehouse is routine.
Common misconceptions and traps
- "My ORM batches for me." Many don't, by default. Hibernate needs
hibernate.jdbc.batch_size+order_inserts=true. SQLAlchemy usesexecutemanywhich is per-row unless you enableinsertmanyvalues(Postgres) or usebulk_save_objects(). Check the wire log. - "Bigger is always faster." No. Past ~50,000 rows you start hitting parameter limits, transaction bloat, and lock duration. Throughput plateaus or regresses.
- "COPY is always the answer." COPY aborts on the first bad row. For dirty data with mixed validity, multi-row INSERT with ON CONFLICT is safer.
- "Each row needs its own transaction for safety." Wrong instinct. A batch transaction is atomic — either all rows commit or none — which is usually what you want. Per-row autocommit is the opposite of safety: you can leave half a logical operation written.
- "Batching breaks ordering." Within one INSERT, rows are inserted in VALUES order. Across batches, you control the order by controlling flush order in your client.
- "I can't batch because I need the row ID back."
INSERT ... RETURNING idworks in multi-row INSERT. You get N ids back in VALUES order. - "Batching breaks for UPSERTs."
ON CONFLICT (...) DO UPDATEworks on every row of a multi-row INSERT independently. Useexcluded.colto reference the proposed values.
Performance characteristics
- Throughput scaling: linear in batch size up to ~5,000 rows, sub-linear thereafter, eventual plateau and regression past ~50,000 rows.
- p99 latency: dominated by batch wall-clock — typical 5-50 ms for a 5,000-row INSERT into a small table; longer with FKs or many indexes.
- WAL volume: ~the same as per-row inserts (data has to be logged either way) — but commit-record overhead drops drastically.
- Memory: batch_size × row_width on the client. 5,000 rows × 200 bytes ≈ 1 MB, negligible.
- Replication lag: batched writes are friendlier to streaming replication because there are fewer commits, and each commit carries proportionally less metadata overhead.
Frequently asked questions
How much faster is a batched insert than per-row inserts?
Typical speedups are 100-1000x for medium batches and even higher for COPY-class bulk loaders. A single-row INSERT over TCP to a remote Postgres pays one round-trip (typically 1-5 ms cross-AZ), one parse/plan cycle (~0.1-0.3 ms), and one commit + WAL flush (~0.5-2 ms). Across 1000 rows that adds up to several seconds. A single INSERT with 1000 rows pays those costs ONCE. Real-world measurements: psycopg2 batch_insert_values on a local Postgres goes from ~2000 rows/sec individual to ~50,000 rows/sec batched (25x). Postgres COPY hits 200,000-500,000 rows/sec single-stream because it skips parser entirely.
What is the optimal batch size?
Most systems plateau in the 100-10,000 rows per batch range. Below 100, fixed per-batch overhead dominates and you leave throughput on the table. Above 10,000, transaction size and lock duration start to bite — long-running inserts hold row-level locks, block VACUUM, and bloat the WAL. The sweet spot is usually 500-2000 rows for OLTP, 10,000+ for time-series. Hard limits matter too: Postgres extended query protocol caps parameters at 65,535, so a 20-column INSERT maxes at ~3,200 rows per execute. JDBC reWriteBatchedInserts has no hard cap. Benchmark with your row width and network latency.
When should I use COPY instead of multi-row INSERT?
Use COPY (Postgres) or LOAD DATA INFILE (MySQL) when you're loading 10,000+ rows at once and don't need per-row ON CONFLICT handling. COPY skips SQL parsing entirely, streams a binary or CSV payload directly into the table, and bypasses most trigger overhead. Typical numbers: COPY runs 5-10x faster than multi-row INSERT for the same data. The trade-off: COPY is one big transaction (no partial commits) and has limited conflict handling. Common production pattern: COPY into a staging table, then INSERT...SELECT...ON CONFLICT into the destination — gets you the speed of COPY plus upsert semantics.
What about transaction grouping for inserts?
Even without multi-row INSERT, wrapping N inserts in one BEGIN/COMMIT yields 10-50x speedup over autocommit, because the WAL is fsynced once per commit, not once per row. Multi-row INSERT and transaction batching are independent optimizations and compose: a single multi-row INSERT inside its own transaction is the strict superset. The classic pattern for ingestion pipelines: accumulate rows in memory for 100-500ms, then issue one multi-row INSERT inside one transaction. Kafka Connect's JDBC sink, Logstash, and Telegraf all do exactly this.
Why does single-row insert have so much overhead?
Per insert, the database does: (1) network read of the SQL text; (2) lexing + parsing the statement; (3) planning (constraint check, index lookup, foreign key validation); (4) acquiring locks; (5) writing a WAL record; (6) flushing WAL to disk on commit (fsync, the killer); (7) returning success over TCP. Across the network the round-trip alone is often >1ms. Multiply by 1000 rows and you've spent several seconds just on overhead unrelated to the data. Batching amortizes (1)-(6) across many rows; bulk loaders like COPY skip (2)-(3) entirely. The disk fsync alone is typically 0.5-2ms on SSDs and is the single biggest per-commit cost.
How does Kafka Connect / Logstash batch inserts in production?
Both buffer incoming records in memory by time and size — typically 'flush every 500ms OR every 5000 records, whichever comes first.' On flush, they issue one multi-row INSERT (or COPY) inside a single transaction. Failure handling: on partial failure they retry the entire batch (idempotent inserts via UPSERT) or split-and-bisect to isolate poison records. Production knobs: batch.size, batch.timeout.ms, max.in.flight.requests. Logstash JDBC output and Kafka Connect JDBC sink both default to 3000-5000 rows per batch and routinely sustain 50,000-200,000 rows/sec per worker on commodity Postgres.