Networking

MTU and IP Fragmentation

Ethernet's 1500-byte Maximum Transmission Unit forces routers (or senders) to split larger packets

The MTU (Maximum Transmission Unit) is the largest packet size a network link can carry without fragmentation — Ethernet defaults to 1500 bytes, IPv4 minimum is 576, IPv6 minimum is 1280. When a packet exceeds the next-hop MTU, IPv4 fragments it: splits into smaller packets each with its own IP header, reassembled at the destination. IPv6 forbids router fragmentation — the sender must run Path MTU Discovery (PMTUD), shrinking probes until ICMP "Packet Too Big" stops arriving. Common pitfalls: misconfigured firewalls drop ICMP, breaking PMTUD ("black hole"); VPN/tunnel encapsulation reduces effective MTU; large UDP packets (DNS over 1500 bytes) often fail.

Ethernet MTU1500 bytes
IPv4 minimum576
IPv6 minimum1280
Jumbo frames9000 (some networks)
PMTUDICMP-based
Frag overhead20-40 bytes per fragment

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

What MTU actually means

Every physical or logical link has a hard upper bound on the size of a single packet it can carry. Ethernet says 1500 bytes of IP payload (1518 bytes including the 14-byte header and 4-byte CRC). Wi-Fi inherits the same 1500. PPP over a 56k modem can be as little as 296. A high-speed datacenter Ethernet with jumbo frames enabled can be 9000.

The MTU is the limit at the link layer. The IP layer above it must produce packets that fit. Two strategies for what to do when an IP packet does not fit:

Fragment. Split the IP packet into multiple smaller packets at the boundary, each carrying a piece. Reassemble at the destination. IPv4 routers do this by default; IPv6 forbids it at routers.
Drop and tell. Drop the oversized packet, send ICMP back to the sender saying "too big, try ≤ X bytes." The sender shrinks and retransmits.

The 1500-byte budget — where each byte goes

Layer	Header bytes (IPv4)	Header bytes (IPv6)
Ethernet (link)	14	14
IP	20 (no options)	40 (fixed)
TCP	20 (no options)	20
TLS record overhead (typical)	~25-40	~25-40
App payload (HTTP body)	~1400	~1380
Ethernet CRC	4	4
Total wire	1518	1518

The standard MSS (Maximum Segment Size) for TCP over IPv4 Ethernet is 1500 − 20 (IP) − 20 (TCP) = 1460. For IPv6 it's 1440. Adding TCP options (timestamps, SACK) shaves another 12-20 bytes.

IPv4 router fragmentation step by step

A 4000-byte UDP datagram on a 1500-MTU link gets split into three IPv4 fragments. Each fragment carries its own 20-byte IP header (40 bytes total of overhead added). The router sets these IP-header fields:

More Fragments (MF). 1 on every fragment except the last.
Fragment Offset. Where this fragment's payload sits inside the original datagram, in 8-byte units.
Identification. The 16-bit IP ID. All fragments of one datagram share the same ID; the destination uses it to reassemble.
Don't Fragment (DF). If set, router drops oversized packets and sends ICMP Type 3 Code 4 instead.

Reassembly happens only at the destination. If any fragment is lost, the whole datagram is lost — there is no IP-level retransmission. TCP retransmits the segment; UDP applications must do their own.

Path MTU Discovery — how senders learn

PMTUD lets the sender find the smallest MTU on the path without ever sending a fragmented packet. The mechanism:

Sender sets DF=1 on every packet (the kernel does this by default for TCP since the late 1990s).
Sender starts at the local interface MTU — typically 1500.
If a router along the path has a smaller next-hop MTU, it drops the packet and emits an ICMP back to the sender:
- IPv4: ICMP Type 3 Code 4 ("Fragmentation Needed and DF Set"), with the next-hop MTU in the message.
- IPv6: ICMPv6 Type 2 ("Packet Too Big"), again with the MTU.
Sender's kernel updates its route cache: "for destination X, MTU is now Y." TCP shrinks its MSS to fit; subsequent UDP sockets see EMSGSIZE on oversized writes.
The path MTU entry stays cached for ~10 minutes (Linux default), then re-probes upward to detect path changes.

ICMP black holes — the recurring pain

PMTUD is fundamentally fragile. It depends on ICMP Type 3 Code 4 (or ICMPv6 Type 2) reaching the sender. Many firewall admins, conflating "ICMP echo" (ping) with "ICMP unreachable," block all ICMP. The result is a black hole:

Sender's small packets (TCP handshake, TLS Hello, first HTTP headers) are below path MTU and reach the destination fine.
Larger packets (full TLS certificates, file downloads, request bodies over a certain size) are dropped by the small-MTU router.
ICMP "too big" message is dropped by the firewall on the way back to the sender.
Sender retransmits at the same large size, which is dropped again. The connection hangs without any error; eventually times out.

Symptom: handshake works, page partially loads, then stalls. Diagnosis: ping -s 1473 -M do destination on Linux — sends a 1501-byte IPv4 packet with DF=1; if it fails silently you have a black hole.

TCP MSS clamping — the standard fix

If you cannot rely on PMTUD (typical inside firewalled enterprise networks, behind VPN tunnels, on cellular), MSS clamping is the workaround. The router on the path inspects every TCP SYN passing through and rewrites the MSS option to the local link's MSS:

# Linux iptables — clamp to PMTU automatically
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --clamp-mss-to-pmtu

# Or to a fixed value (e.g. behind a 1420-byte IPsec tunnel)
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --set-mss 1380

MSS clamping handles TCP cleanly. UDP has no equivalent — UDP applications must either fit in 512 bytes (classic DNS), implement their own path-MTU logic, or accept fragmentation.

IPv6 — sender-only fragmentation

IPv6 (RFC 8200) made three deliberate changes:

Routers never fragment. A router that receives a too-big packet drops it and emits ICMPv6 Packet Too Big.
Path minimum is 1280 bytes. Every IPv6 link must support at least 1280-byte MTU. Senders may always send up to 1280 without doing PMTUD.
Sender fragmentation via extension header. If a sender knows its packet exceeds path MTU and cannot reduce, it can split into IPv6 fragments using the Fragment Extension Header. This moves all reassembly state to the destination — never the routers.

In practice IPv6 PMTUD has the same ICMP black hole problem as IPv4. ICMPv6 Type 2 must reach the sender. Many enterprise firewalls block "all ICMPv6" the same way; common operational guidance is to allow ICMPv6 Types 1, 2, 3, 4 always.

DNS and the 512-byte legacy

Original DNS (RFC 1035, 1987) limited UDP responses to 512 bytes — chosen as a safe minimum below all known MTUs of the era. EDNS0 (RFC 6891, 1999) extended that to 4096 by default. But EDNS0 over UDP is one of the largest sources of fragmentation problems on the modern Internet:

A 2000-byte DNSSEC response from an authoritative server fragments into two IPv4 fragments. Many home routers and embedded firewalls drop fragments. The response never arrives.
The resolver retries over TCP (port 53) — standard fallback, but adds 1+ RTT and breaks if TCP/53 is firewalled.
DNS Flag Day 2020 dropped EDNS UDP buffer recommendations from 4096 to 1232 to avoid fragmentation. 1232 = 1280 (IPv6 min) − 40 (IPv6 header) − 8 (UDP header).

Jumbo frames inside data centers

Jumbo frames (typically 9000 bytes) are commonly enabled on storage networks, RDMA fabrics, and intra-DC backups. Benefits:

Metric	1500-byte MTU	9000-byte MTU
Header overhead	~3%	~0.5%
Packets per GB transferred	~700,000	~115,000
Interrupts per GB (no GRO)	~700,000	~115,000
CPU for line-rate 10 Gbps	1-2 cores	0.3-0.5 cores

Caveat: every link on the path must be configured for 9000. One 1500-MTU hop forces fragmentation (or PMTUD shrink) and erases the gain. Jumbo frames are deliberately scoped — never enabled on the Internet edge.

Diagnosing MTU problems

# Find current interface MTU
ip link show eth0
# mtu 1500

# Test path MTU with DF=1 (won't fragment)
ping -s 1472 -M do 8.8.8.8     # 1472 + 28 = 1500 bytes IPv4
ping -s 1473 -M do 8.8.8.8     # should fail if link MTU is 1500

# Bisect to find actual path MTU
for size in 1500 1400 1300 1200 1100 1000; do
  ping -c1 -s $((size - 28)) -M do 8.8.8.8 >/dev/null && echo "$size OK"
done

# Force PMTUD probe and see kernel's cached value
ip route get 8.8.8.8
# 8.8.8.8 dev eth0 src ... cache mtu 1492 expires 600sec

# Display current TCP MSS for active connections
ss -tin | grep -i mss

# Wireshark filter for fragments
ip.flags.mf == 1 || ip.frag_offset > 0

# Tracepath uses PMTUD probes specifically
tracepath 8.8.8.8

Why MTU matters

VPNs and tunnels. Every encapsulation layer adds 30-80 bytes; mis-set inner MTU causes mysterious "small downloads work, large ones hang."
Large UDP applications. DNS, QUIC handshakes, video streaming initial bursts must respect path MTU or fall back to TCP.
DNSSEC reliability. Large signed responses fragment; fragmented UDP is unreliable on the modern Internet. Drives the move to UDP=1232 plus TCP fallback.
Storage and RDMA. Jumbo frames inside data centers cut CPU per GB by 3-5×.
QUIC (HTTP/3). QUIC enforces 1200-byte minimum packet size and does its own path MTU discovery via PADDING-frame probing.
Mobile carriers. Some operators MSS-clamp to 1400 to leave room for IPsec; check your effective MSS if performance is unexpectedly low.

Common misconceptions

"Always 1500 bytes." 1500 is the Ethernet default. PPPoE links are 1492, IPsec tunnels can be 1380, GRE adds 24 bytes, mobile carriers vary. Production code must not assume 1500.
"Fragmentation is fine." Performance: extra headers, more packets, retransmit-the-whole-datagram on any loss. Security: fragment-overlap attacks, firewall evasion, reassembly DoS. Most modern operators discourage IPv4 fragmentation outright.
"ICMP doesn't matter." ICMP Type 3 Code 4 (IPv4) and ICMPv6 Type 2 are required for PMTUD. Blocking them creates black holes that are hard to diagnose.
"Jumbo frames are universally faster." Only inside controlled networks. On any path with a 1500 hop, jumbos fragment or get dropped — net loss.
"MTU is the same as MSS." MTU is link-layer; MSS is TCP-level. MSS = MTU − IP header − TCP header. Different numbers; both matter.
"DF=1 means no fragmentation ever." DF=1 means routers won't fragment — but the destination can still receive an oversized packet that arrives at a link with smaller MTU and gets dropped instead.
"IPv6 has no fragmentation." IPv6 has sender-only fragmentation via the Fragment Extension Header; routers do not fragment, but senders still can when needed.

Frequently asked questions

Why is Ethernet MTU 1500 bytes specifically?

Historical decision from the 1980 DIX Ethernet specification. Ethernet's frame format had a 14-byte header and 4-byte CRC; payloads from 46 to 1500 bytes fit in one frame. The 1500 cap was chosen as a compromise: large enough to make per-frame overhead acceptable (~3% header tax), small enough that one bad frame on a coaxial bus did not waste much retransmission time. The number stuck. When Ethernet outgrew shared media into switched networks, jumbo frames (9000 bytes) were proposed but never standardized — switches and NICs that support them are configured per-network.

What is Path MTU Discovery?

PMTUD (RFC 1191 for IPv4, RFC 8201 for IPv6) finds the smallest MTU along a path. The sender sets the Don't Fragment flag (DF=1) on every IP packet. If a router on the path has a smaller next-hop MTU, it drops the packet and sends back ICMP Type 3 Code 4 (IPv4) or ICMPv6 Type 2 (IPv6) 'Packet Too Big' with the MTU it can carry. The sender shrinks its segment size and retransmits. After a few rounds the sender knows the path MTU and uses it for the rest of the connection.

Why does IPv6 forbid router fragmentation?

IPv4 router fragmentation was an operational disaster. Routers had to do extra work, fragments could arrive out of order or be lost (leading to whole-packet retransmits), and reassembly at the destination is a denial-of-service vector. IPv6 (RFC 8200) moved fragmentation entirely to the sender via PMTUD and the optional Fragment Extension Header. Routers along the path simply drop oversized packets and emit ICMPv6 Packet Too Big. The trade-off: PMTUD is brittle when ICMPv6 is filtered.

What is an ICMP black hole and why does it break things?

A network where some routers drop ICMP unreachable messages, often because admins blocked all ICMP for 'security'. Effects: PMTUD stops working — the sender never learns to shrink its segment size. Packets keep being sent at the original MTU, get dropped silently somewhere on the path, and the connection hangs without an error. Symptom: small responses (HTTP headers, TLS Hello) work, large responses (file downloads, certificates over a certain size) hang. Fix: do not block ICMP Type 3 Code 4 / ICMPv6 Type 2; or use TCP MSS clamping at the firewall to force smaller segments.

When are jumbo frames useful (data center, storage)?

Jumbo frames (typically 9000 bytes) raise the per-frame overhead from ~3% to ~0.5% and reduce CPU per gigabyte transferred (one interrupt per 9000 bytes vs every 1500). They are valuable inside controlled environments — storage SAN traffic (iSCSI, NFS, NVMe-oF), backup networks, RDMA fabrics, and intra-data-center east-west traffic. They are not standardized for the public Internet because every link on the path must agree; one 1500-byte segment forces fragmentation and erases the gain.

How does GSO/TSO offload affect this?

TCP Segmentation Offload (TSO) and Generic Segmentation Offload (GSO) let the kernel hand the NIC a logical segment up to 64 KB and let the NIC chop it into MTU-sized packets. Saves CPU dramatically (one syscall + headers, instead of dozens). The kernel still computes path MTU; TSO just defers the segmentation. UDP has UFO and GSO equivalents. For receive, GRO (Generic Receive Offload) merges incoming small segments back into a larger logical packet for the kernel to process. End-to-end, MTU still limits what crosses the wire — offloads only move where segmentation happens.

How does VPN encapsulation affect MTU?

Every tunnel adds its own headers. IPsec ESP adds ~50-80 bytes (varying by mode and crypto), GRE adds 24 bytes, WireGuard adds 32 bytes, OpenVPN adds 41+ bytes. Inside the tunnel, the effective MTU is the underlay MTU minus those headers. If the underlay is 1500, the inner MTU is typically 1420-1460. If the inner application sends a 1500-byte packet expecting Ethernet, the tunnel either fragments (slow) or drops. Standard fix: set the inner interface MTU to underlay-minus-overhead, or enable TCP MSS clamping so TCP segments stay below the inner MTU.