Networking

HTTP/2

One connection, dozens of conversations at once

HTTP/2 multiplexes many requests over one TCP connection using binary framing and HPACK header compression, eliminating HTTP/1.1's head-of-line blocking at the application layer.

  • StandardizedRFC 7540 (2015)
  • Connections per origin1 (vs 6 in HTTP/1.1)
  • Wire formatBinary frames
  • Header compressionHPACK
  • Concurrent streams100+ default

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How HTTP/2 works

Open your browser's network panel on any modern site and you'll see the change HTTP/2 made: thirty assets that all start downloading at the same moment over a single connection. Under HTTP/1.1 the browser had to open up to six TCP connections per origin and shuttle requests through them one at a time, because the protocol was a strictly ordered text stream — response two could not begin until response one finished. HTTP/2, standardized in RFC 7540 in May 2015 and derived from Google's experimental SPDY protocol, threw out that ordering constraint without changing a single piece of HTTP semantics. Same methods, same status codes, same headers. What changed is the wire.

The core idea is a three-layer split. At the top, the familiar request/response with its method, path, status, and headers. In the middle, a stream: a logical, bidirectional sequence of frames that carries exactly one request/response exchange, identified by a stream ID. At the bottom, the single TCP connection. Many streams live on one connection at once, and the protocol interleaves their frames freely. Because each frame is stamped with its stream ID, the receiver can demultiplex them back into separate responses even though they arrived shuffled together.

That interleaving is what kills application-layer head-of-line blocking. A slow 4-second API call no longer freezes the queue behind it; its frames simply trickle out between the frames of the fast CSS file that finishes in 40 ms. Both make progress on the same connection in the same round trips.

Binary framing — the precise mechanism

Everything in HTTP/2 is a frame, and every frame has the same 9-byte header followed by a payload:

 +-----------------------------------------------+
 |                 Length (24 bits)              |   payload size, max 2^24-1
 +---------------+---------------+---------------+
 |   Type (8)    |   Flags (8)   |
 +-+-------------+---------------+-------------------------------+
 |R|                 Stream Identifier (31 bits)                 |
 +=+=============================================================+
 |                   Frame Payload (Length bytes)              ...
 +---------------------------------------------------------------+

The Type field selects the frame's job. The ones that carry traffic are HEADERS (an HPACK-compressed header block, opening a stream) and DATA (the body bytes). The rest run the connection: SETTINGS negotiates parameters, WINDOW_UPDATE grants flow-control credit, RST_STREAM cancels a single stream, PING measures round-trip time, and GOAWAY shuts the connection down gracefully.

Two rules make multiplexing safe. First, stream IDs from the client are odd (1, 3, 5, …) and server-initiated streams are even, so the two sides never collide on an ID. IDs only increase, so once a stream closes its number is retired forever. Second, the parsing cost is fixed: the receiver reads exactly 9 bytes, learns the length, and reads exactly that many more — no scanning for a \r\n\r\n delimiter as in HTTP/1.1. Framing turns parsing from O(n) line-scanning into O(1) length-prefixed reads.

HPACK header compression

Headers are the hidden tax of HTTP. A typical request carries 500–800 bytes of headers — cookies, User-Agent, Accept lines — and almost all of it is identical on every request to the same origin. HTTP/1.1 sent that block verbatim, every time, in cleartext that gzip couldn't touch (headers weren't in the compressible body). HPACK (RFC 7541) fixes this with three tools:

  • A 61-entry static table of the most common header fields. :method: GET is index 2; :path: / is index 4. Sending one byte instead of the full field.
  • A per-connection dynamic table that both peers maintain in lockstep. The first time you send cookie: sess=abc..., it's added to the table; every subsequent request references it by index, so a 400-byte cookie becomes a 1–2 byte reference.
  • Huffman coding for the string literals that still must be sent in full, using a fixed code optimized for HTTP characters.

The "lockstep" detail matters: HPACK is stateful and order-dependent, so the dynamic table on the encoder and decoder must evolve identically. That's also why HPACK deliberately avoids the generic DEFLATE used by HTTP/1.1 over TLS — DEFLATE was vulnerable to the CRIME attack, where an attacker inferred secret header values by watching compressed length change. HPACK was designed from scratch to be compression-secure.

When HTTP/2 helps — and when it doesn't

  • Pages with many small assets. The classic win: a page pulling 50 icons, fonts, and CSS chunks no longer fights the 6-connection cap. All 50 stream concurrently.
  • High-latency links. Multiplexing amortizes round trips. One TLS handshake, then everything shares it — no per-asset connection setup.
  • APIs with chatty repeated headers. HPACK pays off most when the same large auth/cookie headers repeat across hundreds of calls.

It helps less, or even hurts, when: the connection is lossy (TCP head-of-line blocking bites hard — see below); you have one giant file (a single stream gains nothing from multiplexing); or you'd already sharded across domains for HTTP/1.1, in which case domain sharding now hurts by forcing multiple connections that defeat HPACK and multiplexing. Un-shard before adopting HTTP/2.

HTTP/2 vs HTTP/1.1 vs HTTP/3

HTTP/1.1HTTP/2HTTP/3
Year1997 (RFC 2068)2015 (RFC 7540)2022 (RFC 9114)
TransportTCPTCPQUIC over UDP
Wire formatText, line-delimitedBinary framesBinary frames
MultiplexingNone (≤6 conns/origin)Streams on 1 connStreams on 1 conn
App-layer HoL blockingYesNoNo
Transport HoL blockingPer-connectionYes (shared TCP)No (per-stream loss)
Header compressionNoneHPACKQPACK
Connection setupTCP + TLS (2–3 RTT)TCP + TLS (2–3 RTT)QUIC (1 RTT, 0-RTT resume)
Connection migrationNoNoYes (survives IP change)

The headline: HTTP/2 fixed head-of-line blocking at the application layer but not the transport layer. Because every stream rides one TCP connection, the kernel's TCP stack still delivers bytes strictly in order — so a single lost packet stalls all streams until it's retransmitted, even streams whose data already arrived. HTTP/3 solves this by replacing TCP with QUIC, which tracks loss per stream. On a clean network HTTP/2 and HTTP/3 perform almost identically; the gap opens on lossy mobile links.

What the numbers actually say

  • Header overhead drops ~80–90% after the first request on a connection. A request whose raw headers are 700 bytes can compress to under 100 bytes once cookies and user-agent are in the dynamic table — and to a handful of bytes if nothing changed.
  • One connection replaces six. HTTP/1.1 browsers capped at ~6 connections per origin; each costs a TCP + TLS handshake (~2–3 round trips, often 100–300 ms on a mobile link). HTTP/2 pays that once and reuses it for 100+ concurrent streams.
  • ~100 concurrent streams. SETTINGS_MAX_CONCURRENT_STREAMS has no spec default (initially unlimited), but RFC 7540 recommends at least 100, and servers settle near it — Apache defaults to 100, nginx to 128; the spec allows up to 2³¹−1.
  • Frame parsing is fixed-cost. 9 header bytes, then a length-prefixed read — no delimiter scan, so a malicious oversized header line can't force quadratic parsing the way it could in some HTTP/1.1 stacks.
  • Packet loss is the equalizer. Studies of HTTP/2 vs HTTP/3 found that above roughly 2% packet loss, HTTP/3 pulls clearly ahead precisely because HTTP/2's shared TCP connection serializes recovery.

JavaScript: a Node.js HTTP/2 server and a frame demuxer

Node ships HTTP/2 in the core http2 module. A multiplexing server is just a stream handler:

import http2 from 'node:http2';
import { readFileSync } from 'node:fs';

const server = http2.createSecureServer({
  key: readFileSync('key.pem'),
  cert: readFileSync('cert.pem'),
  // negotiates "h2" via ALPN during the TLS handshake
});

// Each request is its own stream, multiplexed on one connection.
server.on('stream', (stream, headers) => {
  const path = headers[':path'];        // pseudo-headers start with ':'
  console.log('stream', stream.id, headers[':method'], path);

  stream.respond({
    ':status': 200,
    'content-type': 'text/plain',
  });
  // HEADERS frame above + DATA frame below, both tagged with stream.id
  stream.end(`served ${path} on stream ${stream.id}\n`);
});

server.listen(8443);

The interesting part is what the protocol does underneath. Here's a minimal frame-header reader showing how a receiver demultiplexes interleaved frames back into per-stream buffers — the heart of multiplexing:

const FRAME_TYPE = { DATA: 0x0, HEADERS: 0x1, SETTINGS: 0x4, WINDOW_UPDATE: 0x8 };
const streams = new Map();  // streamId -> accumulated DATA bytes

// Parse frames out of a contiguous buffer of HTTP/2 traffic.
function demux(buf) {
  let off = 0;
  while (off + 9 <= buf.length) {
    // 9-byte fixed header: length(24) | type(8) | flags(8) | R(1)+streamId(31)
    const length   = (buf[off] << 16) | (buf[off + 1] << 8) | buf[off + 2];
    const type     = buf[off + 3];
    const streamId = buf.readUInt32BE(off + 5) & 0x7fffffff;  // mask reserved bit
    const payload  = buf.subarray(off + 9, off + 9 + length);

    if (type === FRAME_TYPE.DATA) {
      const prev = streams.get(streamId) ?? Buffer.alloc(0);
      streams.set(streamId, Buffer.concat([prev, payload]));  // reassemble by ID
    }
    off += 9 + length;   // O(1) advance — no delimiter scan
  }
  return streams;
}

Notice the demux never cares what order the frames arrive in. Stream 3's DATA can sit between two of stream 1's DATA frames; each lands in its own buffer keyed by stream ID. That single fact is the whole multiplexing trick.

Python: serving HTTP/2 and inspecting HPACK

Python's h2 library is a pure sans-IO protocol state machine — you drive it and do the socket work yourself, which makes the frame events explicit:

import socket
from h2.connection import H2Connection
from h2.events import RequestReceived, DataReceived, StreamEnded

def handle(sock):
    conn = H2Connection(config=...)   # server-side config
    conn.initiate_connection()
    sock.sendall(conn.data_to_send())

    streams = {}                       # stream_id -> bytes received
    while True:
        data = sock.recv(65535)
        if not data:
            break
        for event in conn.receive_data(data):   # demultiplexes frames
            if isinstance(event, RequestReceived):
                # event.headers already HPACK-decoded into (name, value) pairs
                streams[event.stream_id] = b""
            elif isinstance(event, DataReceived):
                streams[event.stream_id] += event.data
                conn.acknowledge_received_data(   # return flow-control credit
                    event.flow_controlled_length, event.stream_id)
            elif isinstance(event, StreamEnded):
                conn.send_headers(event.stream_id, [(":status", "200")])
                conn.send_data(event.stream_id, b"ok", end_stream=True)
        sock.sendall(conn.data_to_send())

And a tiny illustration of why HPACK is so effective — the dynamic table as a bounded, indexed cache of recently-seen header fields:

class DynamicTable:
    """HPACK dynamic table: newest entries get the lowest dynamic index."""
    STATIC = {(":method", "GET"): 2, (":path", "/"): 4}   # excerpt of 61 entries

    def __init__(self, max_size=4096):
        self.entries = []          # list of (name, value), newest first
        self.size = 0
        self.max_size = max_size

    def encode(self, name, value):
        if (name, value) in self.STATIC:
            return f"idx {self.STATIC[(name, value)]}"          # 1 byte
        for i, (n, v) in enumerate(self.entries):
            if (n, v) == (name, value):
                return f"dyn {62 + i}"                          # ~1-2 bytes
        # First sight: send literal AND insert into the table for next time.
        self.entries.insert(0, (name, value))
        self.size += len(name) + len(value) + 32               # RFC 7541 overhead
        while self.size > self.max_size:                       # evict oldest
            n, v = self.entries.pop()
            self.size -= len(n) + len(v) + 32
        return f"literal {name}: {value}"

t = DynamicTable()
print(t.encode("cookie", "sess=abc"))   # literal (first time) -> 400 bytes
print(t.encode("cookie", "sess=abc"))   # dyn 62 (cached)      -> ~2 bytes

Variants and features worth knowing

Server Push (deprecated). A server could open an even-numbered stream and send a PUSH_PROMISE for a resource the client hadn't requested yet — say, pushing style.css alongside index.html. In practice it pushed assets browsers already had cached. Chrome disabled it in 2022; rel=preload and 103 Early Hints replaced it.

Stream prioritization (RFC 7540 tree, then RFC 9218). The original spec let clients build a dependency tree with weights so the server knew to send the critical CSS before a below-the-fold image. It was so complex and inconsistently implemented that it was effectively replaced by the far simpler Extensible Prioritization Scheme (urgency 0–7 plus an incremental flag) in RFC 9218.

h2c — cleartext HTTP/2. The spec defines HTTP/2 without TLS, negotiated via the HTTP/1.1 Upgrade header. No browser implements it; it survives only behind reverse proxies and in service meshes where TLS terminates at the edge.

ALPN. Application-Layer Protocol Negotiation is the TLS extension where client and server agree on the protocol during the handshake. The client offers ["h2", "http/1.1"] and the server picks. This avoids an extra round trip just to discover whether the server speaks HTTP/2.

Common bugs and edge cases

  • Leaving domain sharding on. Sharding assets across img1.example.com, img2… was an HTTP/1.1 hack to dodge the 6-connection cap. Under HTTP/2 it forces multiple connections, splitting HPACK tables and defeating multiplexing. Consolidate to one origin.
  • Assuming HoL blocking is gone entirely. It's gone at the app layer, not the transport layer. On lossy networks one dropped TCP segment still stalls every stream. If you see HTTP/2 underperforming on mobile, that's why — and the fix is HTTP/3.
  • The HPACK bomb / decompression DoS. A malicious peer can reference the dynamic table to expand a tiny header block into megabytes. Servers must cap decompressed header size and reset offending streams.
  • Rapid Reset (CVE-2023-44487). An attacker opens a stream and immediately sends RST_STREAM, repeating millions of times to exhaust server resources while staying under the concurrent-stream limit. It powered record-breaking DDoS attacks in 2023; the fix is rate-limiting stream resets.
  • Forgetting flow control. Each stream and the whole connection have independent flow-control windows. If you never send WINDOW_UPDATE, the sender stalls once it exhausts the initial 64 KB window — a classic "downloads freeze at 64 KB" bug.
  • Pseudo-header ordering. The :method, :scheme, :authority, :path pseudo-headers must come before all regular headers in the block, or strict parsers reject the stream with a protocol error.

Frequently asked questions

How does HTTP/2 multiplexing work over a single connection?

Every request opens a new stream identified by an odd-numbered stream ID. Requests and responses are chopped into frames, each frame tagged with its stream ID, and all streams' frames are interleaved on the one TCP connection. The receiver reassembles each stream by its ID, so dozens of responses arrive concurrently instead of queuing behind one another.

Does HTTP/2 fully eliminate head-of-line blocking?

It eliminates application-layer head-of-line blocking — slow responses no longer block fast ones in the request queue. But it does not eliminate transport-layer blocking: because all streams share one TCP connection, a single lost packet stalls every stream until TCP retransmits it. Fixing that required moving to QUIC, which is why HTTP/3 exists.

What is HPACK and why does HTTP/2 need it?

HPACK is HTTP/2's header compression scheme. It uses a 61-entry static table of common header fields plus a per-connection dynamic table, and Huffman-encodes string literals. Because HTTP requests repeat huge headers (cookies, user-agent) on every request, HPACK typically shrinks header overhead by 80–90% after the first request on a connection.

Is HTTP/2 always faster than HTTP/1.1?

Usually, especially for pages with many small assets, because it kills the 6-connections-per-origin limit and the resulting queuing. But on lossy or high-latency networks the shared-connection design can be slower than HTTP/1.1's parallel connections, since one dropped packet stalls all streams instead of just one connection.

Why was HTTP/2 Server Push removed?

Server Push let servers send resources before the client asked, but it routinely pushed assets the browser already had cached, wasting bandwidth and rarely beating well-tuned preload hints. Chrome disabled it in 2022, and the rel=preload link header plus 103 Early Hints replaced it as the standard preloading mechanism.

Does HTTP/2 require HTTPS?

The spec allows cleartext HTTP/2 (h2c), but every major browser only negotiates HTTP/2 over TLS using the ALPN extension during the handshake. In practice HTTP/2 means HTTPS, and the protocol identifier advertised in ALPN is the string h2.