Distributed Patterns

Bulk Insert Batching

One round-trip instead of a thousand — the universal high-write speedup

Bulk insert batching folds many single-row INSERTs into one multi-row statement, paying network and commit costs once. The result: 100-1000x throughput, with no schema changes.

  • Speedup100-1000x typical
  • Optimal batch500-10,000 rows
  • Postgres COPY200k-500k rows/sec single-stream
  • Parameter cap65,535 in Postgres extended protocol
  • Used inKafka Connect, Logstash, Telegraf
  • SavingTCP RTT + parse + WAL fsync

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why batching wins so hard

A single-row INSERT to a remote Postgres looks like a one-line operation in your code. Inside the database, it pays a long list of costs that are constant per statement, not per row:

  • TCP round-trip. 0.1 ms local, 1-5 ms cross-AZ, 30-100 ms cross-region. The client must wait for the server to acknowledge before sending the next row.
  • Statement parsing and planning. Postgres lexes the SQL, builds an AST, validates types, picks a plan. Usually 0.1-0.3 ms cached, longer cold.
  • Locking and constraint checks. Row lock acquisition, FK validation, unique index probes.
  • WAL record + commit. Each transaction writes a commit record to the write-ahead log and fsyncs to disk. That fsync alone is 0.5-2 ms on SSD, 5-15 ms on spinning disk.

Add it up: a typical single-row INSERT over the network costs ~3-10 ms wall clock. That caps you at 100-300 inserts/sec/connection — embarrassingly slow on hardware that can do tens of thousands of rows/sec internally. The database isn't slow; you're paying the overhead a thousand times.

Batching collapses all of those costs to one. Same parser cost. Same plan. Same WAL fsync. The only thing scaling with row count is the actual work of writing the data — which is what you wanted in the first place.

The three batching shapes

You can batch at three different layers, each with different speedups:

  1. Multi-row INSERT. One SQL statement with many VALUES tuples: INSERT INTO metrics (host, ts, v) VALUES ('a',1,2), ('b',3,4), ('c',5,6), .... Works everywhere. 25-100x speedup over per-row INSERTs. Hits parameter limits at thousands of rows wide.
  2. Prepared statement batching. Driver-side: prepare INSERT once, send N parameter sets, server replays the plan for each. JDBC PreparedStatement.addBatch() + executeBatch(). With reWriteBatchedInserts=true Postgres JDBC rewrites into a single multi-row INSERT for you.
  3. Bulk loaders (COPY, LOAD DATA). Stream raw rows in a tight binary or CSV format. No SQL parsing per row. Postgres COPY sustains 200k-500k rows/sec single-stream; MySQL LOAD DATA INFILE is in the same ballpark. The catch: COPY is one big transaction and has limited per-row conflict handling.

A concrete worked example

You want to ingest 1,000,000 metric data points into Postgres on a 1 ms-latency network link. Three implementations:

  • Per-row INSERT, autocommit. Each row pays 1 ms RTT + 0.2 ms parse + 1 ms fsync ≈ 2.2 ms. Total: 2200 seconds (≈37 minutes). Throughput: ~450 rows/sec.
  • Per-row INSERT, single transaction. 1 ms RTT × 1,000,000 = 1000 sec, but fsync only happens once. Total: ~1000 seconds. Still dominated by network round-trips.
  • 1000-row multi-row INSERT, in batches. 1000 batches × (1 ms RTT + 0.2 ms parse + 1 ms fsync + 1000 row-writes at ~1 µs) = 1000 × ~3 ms = 3 seconds. ~700x faster.
  • COPY in one stream. No parse-per-row, no per-row commit. About 0.5-2 seconds for 1M rows. ~1000-4000x faster.

The astonishing leap from "per-row autocommit" to "COPY" is the lived experience of every team that retrofits batching onto an ingestion pipeline. The fastest fix in modern databases is almost always "stop sending one row at a time."

Picking a batch size

Batch too small and per-batch overhead (parse, plan, RTT) still dominates. Batch too large and you stack up problems: long transactions hold row locks, WAL grows, and a single failure forces you to redo a lot of work.

Rule of thumb by workload:

  • OLTP with constraints. 500-2000 rows. Keeps transactions short so other writers aren't blocked.
  • Time-series ingest. 5,000-50,000 rows. Time-series tables usually have no unique constraints to validate; the only cost scaling is data volume.
  • ETL / batch load. 100,000+ rows or COPY the whole file. You don't care about lock duration; you care about wall clock.

Hard cap to remember: Postgres extended-query protocol limits a single statement to 65,535 parameters. A 20-column INSERT is therefore capped at ~3,200 rows per execute. Bypassing this needs string-concat INSERT or COPY.

COPY vs multi-row INSERT

Multi-row INSERT is the universal answer. COPY is the high-end speedup with one catch: it can't handle per-row UPSERTs or row-level error continuation. If a single row in your COPY stream violates a constraint, the whole COPY aborts.

The production trick: COPY into a staging table, then INSERT-SELECT into the destination with ON CONFLICT. You get COPY's raw speed (no parser per row) plus per-row conflict semantics:

BEGIN;
  COPY metrics_staging FROM STDIN;
  INSERT INTO metrics SELECT * FROM metrics_staging
    ON CONFLICT (host, ts) DO UPDATE SET v = excluded.v;
  TRUNCATE metrics_staging;
COMMIT;

This is the pattern Kafka Connect's JDBC sink, dbt, and most ETL frameworks use under the hood.

Pseudo-code: the production buffer-and-flush loop

// Buffer rows by time AND size; flush whichever trips first.

class BatchSink:
    BATCH_SIZE      = 5000
    FLUSH_INTERVAL  = 500ms
    buffer          = []
    last_flush      = now()

    def insert(row):
        buffer.append(row)
        if len(buffer) >= BATCH_SIZE or now() - last_flush >= FLUSH_INTERVAL:
            flush()

    def flush():
        if not buffer: return
        BEGIN
            multi_row_insert(buffer)         // ONE statement, N rows
        COMMIT
        buffer.clear()
        last_flush = now()

The two-trigger flush (size OR time) is the canonical pattern. Without the timer, low-volume periods never flush. Without the size cap, bursts blow memory.

Python: psycopg2 execute_values

import psycopg2
from psycopg2.extras import execute_values

conn = psycopg2.connect("postgres://...")
cur  = conn.cursor()

rows = [(host, ts, value) for ... in stream]   # e.g. 5000 rows

# ONE INSERT with 5000 VALUES tuples; one round-trip; one commit.
execute_values(
    cur,
    "INSERT INTO metrics (host, ts, v) VALUES %s "
    "ON CONFLICT (host, ts) DO UPDATE SET v = excluded.v",
    rows,
    page_size=5000,
)
conn.commit()

On a local Postgres, this jumps from ~2,000 rows/sec (per-row INSERT) to ~50,000 rows/sec (batched) — a real measurement, not a marketing number. With COPY, the same machine hits ~250,000 rows/sec.

Batching strategies compared

Per-row INSERT (autocommit)Per-row INSERT (1 transaction)Multi-row INSERT (batch)COPY / LOAD DATA
Throughput (1M rows)~450 rows/sec~1000 rows/sec~50,000 rows/sec~250,000 rows/sec
Network round-tripsNNN / batch_size1 (streaming)
WAL fsyncsN1~N / batch_size1
Per-row conflict handlingYesYesYes (ON CONFLICT)No (abort on first error)
Lock durationPer-rowTotal runtimePer-batchTotal runtime
Memory footprintNoneNonebatch_size rowsNone (streaming)
Code change requiredBaselineMinimalSmallModerate (driver-specific)

Notice the middle column: just wrapping per-row INSERTs in one transaction doubles throughput thanks to a single fsync. That's the cheapest possible fix when you can't change the SQL shape.

Real-world deployments

  • Kafka Connect JDBC Sink. Default batch.size = 3000, configurable up to ~10,000. Sustains 50-200k rows/sec per worker on commodity Postgres.
  • Logstash JDBC output. Default flush_size = 500, idle_flush_time = 1s. Buffer-and-flush from day one.
  • Telegraf for InfluxDB. metric_batch_size = 1000 default, flush_interval = 10s. Hits 500k metrics/sec/agent.
  • ClickHouse INSERT. Explicitly designed for bulk loads. Recommended batch ≥ 1,000 rows; smaller batches are penalized with insert blocks. Documented throughput: 100MB/sec/node.
  • Snowflake COPY INTO. Bulk loads from S3 staged files; 100GB / minute on a Medium warehouse is routine.

Common misconceptions and traps

  • "My ORM batches for me." Many don't, by default. Hibernate needs hibernate.jdbc.batch_size + order_inserts=true. SQLAlchemy uses executemany which is per-row unless you enable insertmanyvalues (Postgres) or use bulk_save_objects(). Check the wire log.
  • "Bigger is always faster." No. Past ~50,000 rows you start hitting parameter limits, transaction bloat, and lock duration. Throughput plateaus or regresses.
  • "COPY is always the answer." COPY aborts on the first bad row. For dirty data with mixed validity, multi-row INSERT with ON CONFLICT is safer.
  • "Each row needs its own transaction for safety." Wrong instinct. A batch transaction is atomic — either all rows commit or none — which is usually what you want. Per-row autocommit is the opposite of safety: you can leave half a logical operation written.
  • "Batching breaks ordering." Within one INSERT, rows are inserted in VALUES order. Across batches, you control the order by controlling flush order in your client.
  • "I can't batch because I need the row ID back." INSERT ... RETURNING id works in multi-row INSERT. You get N ids back in VALUES order.
  • "Batching breaks for UPSERTs." ON CONFLICT (...) DO UPDATE works on every row of a multi-row INSERT independently. Use excluded.col to reference the proposed values.

Performance characteristics

  • Throughput scaling: linear in batch size up to ~5,000 rows, sub-linear thereafter, eventual plateau and regression past ~50,000 rows.
  • p99 latency: dominated by batch wall-clock — typical 5-50 ms for a 5,000-row INSERT into a small table; longer with FKs or many indexes.
  • WAL volume: ~the same as per-row inserts (data has to be logged either way) — but commit-record overhead drops drastically.
  • Memory: batch_size × row_width on the client. 5,000 rows × 200 bytes ≈ 1 MB, negligible.
  • Replication lag: batched writes are friendlier to streaming replication because there are fewer commits, and each commit carries proportionally less metadata overhead.

Frequently asked questions

How much faster is a batched insert than per-row inserts?

Typical speedups are 100-1000x for medium batches and even higher for COPY-class bulk loaders. A single-row INSERT over TCP to a remote Postgres pays one round-trip (typically 1-5 ms cross-AZ), one parse/plan cycle (~0.1-0.3 ms), and one commit + WAL flush (~0.5-2 ms). Across 1000 rows that adds up to several seconds. A single INSERT with 1000 rows pays those costs ONCE. Real-world measurements: psycopg2 batch_insert_values on a local Postgres goes from ~2000 rows/sec individual to ~50,000 rows/sec batched (25x). Postgres COPY hits 200,000-500,000 rows/sec single-stream because it skips parser entirely.

What is the optimal batch size?

Most systems plateau in the 100-10,000 rows per batch range. Below 100, fixed per-batch overhead dominates and you leave throughput on the table. Above 10,000, transaction size and lock duration start to bite — long-running inserts hold row-level locks, block VACUUM, and bloat the WAL. The sweet spot is usually 500-2000 rows for OLTP, 10,000+ for time-series. Hard limits matter too: Postgres extended query protocol caps parameters at 65,535, so a 20-column INSERT maxes at ~3,200 rows per execute. JDBC reWriteBatchedInserts has no hard cap. Benchmark with your row width and network latency.

When should I use COPY instead of multi-row INSERT?

Use COPY (Postgres) or LOAD DATA INFILE (MySQL) when you're loading 10,000+ rows at once and don't need per-row ON CONFLICT handling. COPY skips SQL parsing entirely, streams a binary or CSV payload directly into the table, and bypasses most trigger overhead. Typical numbers: COPY runs 5-10x faster than multi-row INSERT for the same data. The trade-off: COPY is one big transaction (no partial commits) and has limited conflict handling. Common production pattern: COPY into a staging table, then INSERT...SELECT...ON CONFLICT into the destination — gets you the speed of COPY plus upsert semantics.

What about transaction grouping for inserts?

Even without multi-row INSERT, wrapping N inserts in one BEGIN/COMMIT yields 10-50x speedup over autocommit, because the WAL is fsynced once per commit, not once per row. Multi-row INSERT and transaction batching are independent optimizations and compose: a single multi-row INSERT inside its own transaction is the strict superset. The classic pattern for ingestion pipelines: accumulate rows in memory for 100-500ms, then issue one multi-row INSERT inside one transaction. Kafka Connect's JDBC sink, Logstash, and Telegraf all do exactly this.

Why does single-row insert have so much overhead?

Per insert, the database does: (1) network read of the SQL text; (2) lexing + parsing the statement; (3) planning (constraint check, index lookup, foreign key validation); (4) acquiring locks; (5) writing a WAL record; (6) flushing WAL to disk on commit (fsync, the killer); (7) returning success over TCP. Across the network the round-trip alone is often >1ms. Multiply by 1000 rows and you've spent several seconds just on overhead unrelated to the data. Batching amortizes (1)-(6) across many rows; bulk loaders like COPY skip (2)-(3) entirely. The disk fsync alone is typically 0.5-2ms on SSDs and is the single biggest per-commit cost.

How does Kafka Connect / Logstash batch inserts in production?

Both buffer incoming records in memory by time and size — typically 'flush every 500ms OR every 5000 records, whichever comes first.' On flush, they issue one multi-row INSERT (or COPY) inside a single transaction. Failure handling: on partial failure they retry the entire batch (idempotent inserts via UPSERT) or split-and-bisect to isolate poison records. Production knobs: batch.size, batch.timeout.ms, max.in.flight.requests. Logstash JDBC output and Kafka Connect JDBC sink both default to 3000-5000 rows per batch and routinely sustain 50,000-200,000 rows/sec per worker on commodity Postgres.