Networking

WebRTC & NAT Traversal

How two browsers behind two routers find each other — and talk directly

WebRTC lets two browsers exchange audio, video, and data peer-to-peer with no media server, using ICE to test candidate paths, STUN to discover public addresses, and TURN to relay when NATs refuse a direct hole-punch.

  • TransportUDP (SRTP / SCTP-over-DTLS)
  • NAT traversalICE (RFC 8445)
  • Direct-connect rate≈ 80–92%
  • TURN fallback≈ 8–20% of sessions
  • EncryptionMandatory (DTLS-SRTP)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The problem: nobody can be reached

Open two browser tabs on opposite sides of the planet and ask them to send each other video. Neither one has a public IP address. Both sit behind a home router doing Network Address Translation, which rewrites the source address of outgoing packets and only lets replies in if they match a connection the inside host started first. A packet that arrives unsolicited at your router is dropped. So neither peer can simply connect to the other — there is no reachable address to connect to, and the firewall would refuse it anyway.

WebRTC is the browser API and protocol suite that solves this. Its goal is a direct path between the two peers — no media server in the middle relaying your call frame by frame — so latency stays low and your bandwidth bill stays near zero. The trick is to make both routers open a hole at the same time by having both peers send outbound packets at each other simultaneously. Each router sees an outbound packet, creates a mapping, and now treats the incoming packet from the other side as a "reply." That is UDP hole punching, and orchestrating it reliably is what ICE, STUN, and TURN are for.

One thing WebRTC does not solve: the bootstrap. Two peers who can't reach each other obviously can't tell each other their addresses either. You supply a signaling channel — typically a WebSocket to a tiny server both peers already connect to — and WebRTC hands you opaque blobs (SDP and ICE candidates) to ferry across it. Signaling is your problem; the media path is WebRTC's.

ICE: gather, pair, probe

ICE — Interactive Connectivity Establishment, standardized in RFC 8445 — is the engine. It runs in three phases.

1. Gather candidates. Each peer collects every address it might be reachable on:

  • Host candidates — the machine's own LAN addresses (e.g. 192.168.1.7:51234). Free, instant, work only if peers share a network.
  • Server-reflexive (srflx) candidates — the public IP:port the NAT exposed, learned by asking a STUN server "what address did my packet come from?" This is the candidate that enables hole punching.
  • Relayed candidates — an IP:port allocated on a TURN server that will forward traffic. The last resort.

2. Pair and prioritize. Every local candidate is paired with every remote candidate. With 3 candidates each you get up to 9 pairs. Each pair gets a 64-bit priority computed so that host×host pairs sort highest, then srflx, then relay — and the answerer's ordering matches the offerer's, so both sides probe the same pair first.

3. Connectivity checks. ICE sends STUN Binding request/response handshakes across each candidate pair, highest priority first. The first pair where a check succeeds in both directions becomes a valid pair. ICE keeps the best valid pair as the nominated pair and routes media over it. Because both sides are firing checks, the simultaneous outbound packets are exactly what punches the holes.

The priority formula from RFC 8445 is worth seeing, because it explains why a relay is always tried last:

priority = (2^24) * type_preference
         + (2^8)  * local_preference
         + (2^0)  * (256 - component_id)

type_preference:  host = 126, srflx = 100, relay = 0

A relayed candidate's type_preference of 0 buries it beneath every host and reflexive pair. ICE only ends up on TURN when nothing cheaper survives its checks.

STUN vs TURN — discovery vs relay

STUN (Session Traversal Utilities for NAT, RFC 8489) is almost trivially cheap. The client sends one UDP Binding request; the server copies the packet's observed source IP:port into the reply's XOR-MAPPED-ADDRESS. That is the public address the NAT mapped you to. Two packets, no media, no state to speak of — Google runs stun.l.google.com:19302 for free.

TURN (Traversal Using Relays around NAT, RFC 8656) is a full media relay. When direct paths fail, each peer sends its SRTP to the TURN server, which forwards it to the other peer. Every byte of audio and video transits the relay, so TURN is the part of the system that actually costs money to run: a single 720p video call can push 1.5–3 Mbps in each direction through the relay.

STUNTURN
What it providesYour public IP:port (one fact)A relay address that forwards media
On the media path?No — peers connect directly afterwardYes — every packet flows through it
Bandwidth cost to operator≈ 0 (two small packets)Full call bitrate, both directions
Latency addedNone to the callOne extra hop (often 10–60 ms)
Works through symmetric NATNo (mapping is destination-specific)Yes (always — it's a fixed relay)
TransportUDP (TCP/TLS optional)UDP, TCP, or TLS-over-443 to dodge firewalls
RFCRFC 8489RFC 8656

The relationship is hierarchical: TURN includes STUN. A TURN server speaks STUN too, and a TURN allocation also hands back a server-reflexive candidate. In practice you configure both as ICE servers and let ICE pick the cheapest path that works.

When to reach for WebRTC

  • Real-time A/V calls — video conferencing, live tutoring, telehealth. Sub-200 ms mouth-to-ear latency is the whole point, and a relay-everything architecture can't hit it cheaply at scale.
  • Peer-to-peer data channels — file transfer, game state, screen sharing. The RTCDataChannel gives you an ordered-or-unordered, reliable-or-unreliable SCTP stream that bypasses your servers entirely.
  • Low-latency, cost-sensitive fan-out — when you'd rather not pay to relay every stream, mesh or SFU topologies built on WebRTC keep media off your origin.

Reach for something else when you don't need real-time. If you're streaming one-to-many at scale, HLS/DASH over a CDN is cheaper and more robust than WebRTC fan-out. If you need a simple reliable browser-to-server channel, a plain WebSocket or QUIC/HTTP-3 is far less machinery. WebRTC's complexity only pays off when you genuinely need direct, low-latency, peer-to-peer media.

What the numbers actually say

  • Direct-connect succeeds for roughly 80–92% of sessions. The remainder fall back to TURN. Symmetric NAT on both ends, UDP-blocking enterprise firewalls, and carrier-grade NAT are the usual culprits.
  • TURN carries ≈ 8–20% of calls but a disproportionate share of cost. A relayed 720p call at ~2.5 Mbps each way is ~5 Mbps of relay traffic; at typical cloud egress of $0.08–0.09/GB, an hour-long relayed call costs the operator on the order of $0.15–0.20 just in bandwidth — versus essentially $0 for a STUN-only direct call.
  • ICE check timing. A check transaction uses an RTO that starts around 500 ms and backs off; the candidate gathering plus checks typically converge in a few hundred milliseconds to a couple of seconds, which is why a WebRTC call doesn't connect instantly.
  • Candidate explosion. With k candidates per side, ICE forms up to pairs and runs O(k²) connectivity checks — fine for the handful of addresses a typical host has, but a reason aggressive candidate gathering (every VPN and virtual NIC) can slow setup.

JavaScript: a minimal peer connection

The browser API hides ICE, STUN, and TURN behind a few calls. You configure the ICE servers, create the connection, and shuttle the offer/answer and candidates over your own signaling channel.

// Both STUN (discovery) and TURN (relay fallback) are configured here.
const pc = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'turn:turn.example.com:3478',
      username: 'user', credential: 'pass' },   // TURN needs auth
  ],
});

// 1. Trickle ICE: fire each candidate to the peer as it is gathered.
pc.onicecandidate = ({ candidate }) => {
  if (candidate) signaling.send({ type: 'ice', candidate });
};

// 2. Watch the negotiated path. 'relay' means TURN won.
pc.oniceconnectionstatechange = () => {
  console.log('ICE state:', pc.iceConnectionState); // checking → connected
};

// 3. Caller side: attach media, make an SDP offer, send it.
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: true });
stream.getTracks().forEach(t => pc.addTrack(t, stream));

const offer = await pc.createOffer();
await pc.setLocalDescription(offer);          // starts ICE gathering
signaling.send({ type: 'offer', sdp: offer });

// 4. Handle whatever the peer sends back over signaling.
signaling.onmessage = async ({ type, sdp, candidate }) => {
  if (type === 'answer') await pc.setRemoteDescription(sdp);
  else if (type === 'ice') await pc.addIceCandidate(candidate);
};

The callee mirrors this: on receiving the offer it calls setRemoteDescription(offer), createAnswer(), setLocalDescription(answer), and sends the answer back. Notice you never write a single line of NAT-traversal logic — you only choose which STUN/TURN servers ICE may use. The trickle ICE pattern above (sending candidates as they're discovered, rather than waiting for all of them) is what shaves seconds off connection setup.

Python (aiortc): inspecting the chosen pair

Outside the browser, aiortc gives the same model in Python and lets you read back which candidate pair ICE actually nominated — useful for confirming whether you hole-punched or fell back to a relay.

import asyncio
from aiortc import RTCPeerConnection, RTCConfiguration, RTCIceServer

async def main():
    config = RTCConfiguration(iceServers=[
        RTCIceServer(urls="stun:stun.l.google.com:19302"),
        RTCIceServer(urls="turn:turn.example.com:3478",
                     username="user", credential="pass"),
    ])
    pc = RTCPeerConnection(configuration=config)

    @pc.on("iceconnectionstatechange")
    async def on_state():
        print("ICE state:", pc.iceConnectionState)
        if pc.iceConnectionState == "completed":
            # aiortc has no getSelectedCandidatePair(); read the nominated
            # pair from the underlying aioice Connection (component 1).
            ice = pc.sctp.transport.transport             # RTCDtlsTransport -> RTCIceTransport
            pair = ice._connection._nominated.get(1)      # aioice CandidatePair or None
            if pair:
                # candidate type is 'host', 'srflx', 'prflx', or 'relay'
                print(f"chose {pair.local_candidate.type} <-> {pair.remote_candidate.type}")

    # ... create data channel / offer, exchange SDP over your signaling ...
    pc.createDataChannel("chat")
    offer = await pc.createOffer()
    await pc.setLocalDescription(offer)
    # send pc.localDescription.sdp to the peer, await their answer, set it.

asyncio.run(main())

The pair.local_candidate.type / pair.remote_candidate.type readout is the ground truth of NAT traversal: host<->host means same network, srflx on either side means a successful hole-punch, and relay anywhere means TURN is forwarding your call.

NAT types and topologies worth knowing

The four classic NAT behaviors. Hole punching's success depends entirely on how the NAT maps ports. Full-cone and restricted-cone NATs reuse the same external port for a given internal socket regardless of destination — STUN-learned ports stay valid, so direct connect works. Symmetric NATs allocate a fresh external port per destination, so the port STUN saw (toward the STUN server) differs from the port used toward the peer — the punched hole is the wrong one. RFC 4787 recommends "endpoint-independent mapping" precisely so WebRTC works; most modern home routers comply.

Mesh vs SFU vs MCU. For multi-party calls, a full mesh has every peer send to every other (n·(n−1) streams — fine for 3–4 people, brutal beyond). An SFU (Selective Forwarding Unit) is a server that receives each peer's one upstream and forwards copies — far cheaper on the sender's uplink, and it's still WebRTC end to end. An MCU mixes everything into one composited stream — cheapest on clients, heaviest on the server, and it breaks end-to-end media.

Trickle ICE vs vanilla ICE. Vanilla ICE waits to gather every candidate before sending the SDP — simple but slow. Trickle ICE (RFC 8838) sends the offer immediately and streams candidates as they arrive, overlapping gathering with checking and cutting setup latency dramatically. Every modern stack trickles.

ICE-TCP and TURN-over-TLS-443. Some networks block UDP outright. ICE can fall back to TCP candidates, and TURN can run over TLS on port 443 so it looks like ordinary HTTPS — the escape hatch that lets WebRTC traverse the strictest corporate firewalls.

Common bugs and edge cases

  • No TURN server configured. The single most common "works for me, fails for users" bug. On the same LAN, host candidates connect and everything looks fine; ship it, and the 8–20% of users behind symmetric NAT or UDP-blocking firewalls can never connect. Always configure TURN before launch.
  • Forgetting it's turn:, not stun:, for the relay. A stun: URL to your TURN server only yields a reflexive candidate, never a relayed one. The relay needs the turn: scheme and valid credentials, or allocation silently fails.
  • Signaling race — the "glare" problem. If both peers call createOffer() at once, their states collide. Use the perfect negotiation pattern: designate one peer "polite," and have it roll back its local offer when a competing offer arrives.
  • Adding ICE candidates before the remote description. Calling addIceCandidate() before setRemoteDescription() throws or drops the candidate. Buffer incoming candidates until the remote description is set.
  • mDNS host candidates surprising your STUN-less LAN test. Browsers now hide local IPs behind .local mDNS names for privacy. If your peers can't resolve mDNS, host-candidate pairing fails and you may be forced onto srflx/relay even on the same network.
  • Leaving TURN credentials long-lived. Static TURN passwords get scraped and abused as open relays. Use short-lived time-limited credentials (the REST API ephemeral-credential scheme) so a leaked secret expires in minutes.

Frequently asked questions

What is the difference between STUN and TURN?

STUN only tells a peer its own public IP and port as seen from outside the NAT, so the two peers can attempt a direct connection — it carries no media. TURN is a full relay: when a direct path is impossible, both peers send their media to a TURN server that forwards it. STUN is nearly free; TURN pays for and routes every byte of the call.

What is ICE in WebRTC?

ICE (Interactive Connectivity Establishment, RFC 8445) is the algorithm that gathers every possible address for a peer — local, STUN-reflexive, and TURN-relayed — pairs them with the other peer's candidates, and runs connectivity checks on each pair in priority order until one succeeds. It is the part of WebRTC that actually "punches through" the NAT.

Why does WebRTC need a signaling server if it's peer-to-peer?

Two browsers cannot talk to each other until they have each other's address and crypto parameters, but they have no shared channel to exchange them yet. The signaling server (over WebSocket, HTTP, or anything you like) carries the SDP offer/answer and ICE candidates between them. Once the peer connection is live, signaling is no longer on the media path.

What percentage of WebRTC calls fall back to a TURN relay?

Published measurements from large deployments put TURN usage at roughly 8–20% of sessions, depending on the network mix. Symmetric NATs, enterprise firewalls that block UDP, and carrier-grade NAT push that number up; consumer home routers with endpoint-independent mapping rarely need a relay.

Why can't two peers behind symmetric NATs connect directly?

A symmetric NAT assigns a different external port for each distinct destination. The port a peer learns from STUN (talking to the STUN server) is not the port the NAT will use when sending to the other peer, so the hole that was punched is useless. Two symmetric NATs together almost always force a TURN relay.

Is WebRTC media encrypted?

Yes, mandatorily. Media flows over SRTP keyed by DTLS-SRTP, and the SCTP data channel runs over DTLS. There is no unencrypted mode — a WebRTC stack that skipped encryption would be non-conformant. Even a TURN-relayed call stays end-to-end encrypted; the relay forwards opaque ciphertext.