Networking
Nagle's Algorithm
Buffer small writes until ACK or full segment — saves bandwidth, but causes 40 ms latency spikes
Nagle's algorithm (RFC 896, 1984, John Nagle) reduces small-packet inefficiency in TCP by buffering small writes until either (a) the buffer reaches MSS (maximum segment size, ~1460 bytes) or (b) a previous segment has been ACKed. Designed to fix the "tinygram" problem from interactive Telnet — without it, every keystroke sends a 41-byte packet (1 byte payload + 40 bytes overhead) at ~3% efficiency. Combined with the delayed ACK (the receiver buffers ACKs up to 40 ms or 200 ms hoping to piggyback data), Nagle can introduce a 40 ms latency spike on small request-response traffic. Solution: TCP_NODELAY socket option, ubiquitous in HTTP servers, gRPC, Redis, Postgres clients.
- RFC896 (1984)
- Buffersuntil ACK or MSS
- Delayed ACK pairing40-200 ms latency
- SolutionTCP_NODELAY
- MSS typical1460 bytes
- Tinygram1 byte payload, 40 byte overhead
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The actual rule
Nagle's algorithm is one paragraph in RFC 896:
If there is unacknowledged data already in flight (i.e., previously sent and not yet ACKed), and the data the application has just submitted is smaller than the maximum segment size, then buffer the new data and do not send it until either the previous data is ACKed or the buffer fills to MSS.
Stated as code:
// Pseudo-Linux TCP send path
if (data.len >= mss
|| nothing_in_flight()
|| tcp_nodelay_set()) {
send_segment(data);
} else {
buffer.append(data); // Nagle holds it
// Will flush when ACK arrives or buffer reaches mss
}
The original problem — tinygrams
It is 1984. John Nagle is at Ford Aerospace. Engineers across the Atlantic use Telnet over a leased line back to California. Each keystroke generates one IP packet:
| Field | Bytes |
|---|---|
| Ethernet header | 14 |
| IPv4 header | 20 |
| TCP header | 20 |
| Payload (one keystroke) | 1 |
| Total on the wire | 55 (or 41 above IP layer) |
The link spent 98% of its bytes on protocol overhead. Worse, packet rate, not bit rate, was the limit on routers of the era. Nagle's solution — coalesce while a previous segment is unacked — reduced the packet rate dramatically without harming the perceived interactivity, because the next keystroke usually arrived before the previous one's ACK did.
Delayed ACK — the second algorithm
RFC 1122 (1989) introduced delayed ACK: the receiver may hold an ACK for up to 500 ms (capped at 200 ms in BSD, 40 ms in modern Linux) hoping to piggyback the ACK onto a response data packet. Independently sensible — saves a packet on most request-response protocols.
The two algorithms were designed independently and never tested together at the time. They interact pathologically.
The 40 ms problem step by step
Consider an application that does two writes for one logical message — a 4-byte header followed by a 200-byte body. With Nagle on the sender and delayed ACK on the receiver:
- t = 0 ms. Sender writes 4-byte header. Nagle sees no data in flight, so sends it immediately as one segment.
- t = 1 ms. Receiver receives the 4-byte header. It has no data to send back, so delayed ACK starts the 40 ms timer.
- t = 1 ms. Sender writes 200-byte body. Nagle sees 4 bytes still in flight (unacked), and 200 < MSS, so it buffers the body.
- t = 41 ms. Receiver's delayed-ACK timer fires. ACK for header is sent.
- t = 42 ms. Sender receives the ACK; Nagle releases the buffered body.
- t = 43 ms. Receiver gets the body and processes the request.
Net latency: 43 ms instead of 1-2 ms. The same pathology applies to every small request-response: ssh keystrokes, Redis pings, gRPC heartbeats, X11 events.
The four fixes
| Fix | Where | Effect |
|---|---|---|
TCP_NODELAY | Sender socket | Disable Nagle entirely; small writes go immediately |
TCP_QUICKACK (Linux) | Receiver socket | Disable delayed ACK temporarily; ACKs sent immediately |
| Application-level buffering | Sender code | Build the whole message in one buffer, single write() |
TCP_CORK + uncork | Sender socket | Force-buffer until you uncork — opposite of NODELAY |
What major libraries do by default
| Library / Server | Default |
|---|---|
| nginx | TCP_NODELAY on; TCP_CORK during sendfile |
| Apache httpd | TCP_NODELAY on |
| Node.js net | TCP_NODELAY on (since 0.10) |
| Go net/http | TCP_NODELAY on |
| Python requests / urllib3 | TCP_NODELAY on (HTTPSConnection sets it) |
| Redis (server + most clients) | TCP_NODELAY on |
| PostgreSQL | TCP_NODELAY on (libpq) |
| gRPC (every language) | TCP_NODELAY on |
| OpenSSH | TCP_NODELAY on for interactive sessions |
Setting TCP_NODELAY in practice
// C
int flag = 1;
setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));
# Python
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
// Go
conn, _ := net.Dial("tcp", "host:80")
conn.(*net.TCPConn).SetNoDelay(true) // default since Go 1.0
// Rust
use std::net::TcpStream;
let stream = TcpStream::connect("host:80")?;
stream.set_nodelay(true)?;
// Node
const net = require('net');
const socket = net.connect({ host: 'h', port: 80 });
socket.setNoDelay(true); // default true
When Nagle still wins
- Bulk file transfer with many small writes. If the application can't easily coalesce (e.g. a streaming compressor that emits 4-byte chunks), Nagle prevents tinygram bandwidth waste.
- Backwards compatibility. Old protocols designed pre-NODELAY assumed Nagle would coalesce; disabling it can break their assumptions about message boundaries on small writes.
- Mobile / cellular links. Per-packet radio overhead in LTE/5G means coalescing helps battery and cost — though modern apps coalesce at the application layer instead.
Diagnosing Nagle latency
# Check current TCP_NODELAY on a Linux socket via /proc
ss -i -t -n | head -20
# Look for: ts sack ... cwnd 10 ssthresh 7 ...
# Trace small-packet writes
strace -e trace=write,sendto,sendmsg -p $(pgrep myapp) 2>&1 | awk '$NF < 100'
# Tcpdump shows small segments and the resulting delayed ACK
tcpdump -i any -nnv 'port 6379 and tcp[13] != 16'
# Look for: Flags [P.], length 4 ... then 40 ms gap ... Flags [.], ack ...
# bpftrace: histogram of write sizes
bpftrace -e 'tracepoint:syscalls:sys_enter_write { @[args->count] = count(); }'
Why Nagle's matters
- Interactive performance. Knowing about Nagle/delayed-ACK saves debugging hours when a Redis or Postgres client suddenly shows 40 ms tail latency.
- Web servers. Default TCP_NODELAY plus TCP_CORK during response transmission gives both small-message latency and bulk efficiency.
- RPC frameworks. gRPC, Thrift, FlatBuffers — every framework that does request-response over TCP needs NODELAY or it tail-latencies under load.
- Game and real-time protocols. Most use UDP precisely to avoid Nagle's existence; if they use TCP, they always disable it.
- Mobile data plans. Coalescing at the application layer (which is the modern alternative to Nagle) saves battery and metered bytes.
Common misconceptions
- "Nagle's makes things faster." Saves bandwidth; usually hurts latency. Faster only on bandwidth-constrained, packet-cost-limited tinygram traffic — a regime that essentially no longer exists on modern wired networks.
- "Always disable it." For interactive applications, yes. For bulk transfer with many small writes per second, leaving Nagle on can save measurable bandwidth — though application-layer batching is preferable.
- "Nagle is the same as kernel send buffering." Different layer. Nagle is a TCP-level coalescing rule; the socket send buffer is a separate kernel-side queue. NODELAY does not bypass the send buffer.
- "TCP_NODELAY is dangerous." The Linux kernel has supported it cleanly since the 1990s; it does not break correctness, only the implicit coalescing assumption.
- "40 ms always." Linux delayed-ACK timer is 40 ms; macOS and BSD have used 200 ms historically; some older kernels capped at 500 ms. The exact spike depends on the receiver's OS.
- "HTTP/2 needs Nagle." Almost all HTTP/2 servers disable Nagle and rely on H2 frame batching at the application layer.
- "Nagle exists in HTTP/3." No. QUIC is over UDP; Nagle is a TCP-only rule. QUIC has its own pacing logic but no equivalent ACK-coalescing pathology.
Frequently asked questions
Why does Nagle's algorithm exist?
John Nagle wrote RFC 896 in 1984 to fix Ford Aerospace's congested network. Their interactive terminal traffic was sending one TCP segment per keystroke — a 41-byte packet (1 byte of data + 20 bytes IP header + 20 bytes TCP header) for every typed character. On a 9600 baud modem this was ~3% efficient and saturated their links with 'tinygrams'. Nagle's algorithm coalesces those one-byte writes into larger segments, raising efficiency dramatically while still flushing whenever the receiver acknowledges or the buffer fills.
What is delayed ACK and why does it interact badly with Nagle?
Delayed ACK (RFC 1122) lets the receiver hold an acknowledgment for up to 200 ms (often 40 ms in modern Linux) hoping to piggyback the ACK onto a response data packet, saving a packet. Independently sensible. Nagle waits for an ACK before sending more small data; delayed ACK waits for data before sending an ACK. They deadlock on small request-response traffic — the sender has more small data, the receiver has nothing to send back, and both wait. The 40-200 ms delayed-ACK timer eventually fires, then Nagle releases. The result is a 40 ms latency spike on every other request.
When should you set TCP_NODELAY?
Set TCP_NODELAY (disabling Nagle's) any time application latency matters more than packet efficiency. Examples: HTTP servers, gRPC services, Redis clients, Postgres connections, SSH, real-time games, financial trading. Default in nginx, Apache, Node, gRPC, Redis. Modern applications already buffer at the application layer (write a complete request, then flush), so Nagle adds nothing but latency. Leave Nagle on only if you genuinely have a chatty interactive protocol that does many tiny socket writes per logical message.
What is the 40 ms problem?
The classic Nagle + delayed-ACK pathology. Application sends two small writes — say a 4-byte header and a 200-byte body — without flushing. Nagle sends the first; the receiver delays ACK hoping to piggyback. Nagle waits for the ACK before sending the second. The delayed-ACK timer fires at 40 ms (Linux default, formerly 200 ms on BSD); Nagle then releases the second write. Net latency: 40+ ms instead of <1 ms. The fix is either TCP_NODELAY on the sender, TCP_QUICKACK on the receiver, or coalescing the writes into one buffer at the application layer.
Does HTTP/2 fix this?
HTTP/2 mostly avoids the problem because frames are usually large enough that Nagle never withholds them — and almost every HTTP/2 server sets TCP_NODELAY anyway. HTTP/3 over QUIC sidesteps it entirely: QUIC is over UDP, so Nagle's algorithm (a TCP-only feature) does not apply. QUIC has its own pacing logic but no equivalent 40 ms pathology with delayed ACK because QUIC's ACK frames are sent more eagerly.
Should libraries enable TCP_NODELAY by default?
Yes, for almost every modern client and server. The Linux kernel ships with Nagle on by default for backward compatibility, but every major library (libcurl, Go net/http, Python requests, .NET, Node, Rust hyper) sets TCP_NODELAY on connect. Database drivers (psycopg, redis-py, MySQL connector) do the same. The single exception is bulk file-transfer protocols where the application does many small writes per second and benefits from coalescing — even there, application-level buffering is usually preferable.
What is TCP_CORK and how does it differ from TCP_NODELAY?
TCP_CORK (Linux) and TCP_NOPUSH (BSD) is the opposite of TCP_NODELAY. It tells the kernel to actively buffer all writes until you uncork — useful when you know you are about to write a complete logical message in multiple system calls (HTTP headers + body, sendfile + trailer). nginx and Apache use it to coalesce response chunks. Setting TCP_CORK and TCP_NODELAY on the same socket is allowed in modern Linux; CORK takes precedence while set, NODELAY behavior resumes after un-corking.