Networking

Reverse Proxies

One public door in front of many private servers — terminating TLS, balancing load, caching, and hiding your topology

A reverse proxy is a server that sits in front of your backend servers, accepting client requests and forwarding them to one of several origins — terminating TLS, load balancing, caching, and hiding your topology behind a single public endpoint.

LayerL7 (HTTP) or L4 (TCP)
Client seesOne IP / one cert
Adds per request~0.1–1 ms proxy hop
TLS handshakes savedN backends → 1 cert
Common softwareNGINX, Envoy, HAProxy, Caddy

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How a reverse proxy works

Picture a busy restaurant. Diners never walk into the kitchen — they talk to the host, who takes the order, decides which line cook handles it, and brings the food back. The diners never learn how many cooks there are, which one is sick today, or that the dessert station is in a different building. A reverse proxy is that host for your servers.

Concretely: a client opens a TCP connection to a single public IP and asks for https://shop.example.com/cart. That connection lands on the reverse proxy, not on any application server. The proxy completes the TLS handshake, reads the HTTP request, decides which backend should answer (based on the path, the hostname, a cookie, or a health check), opens or reuses its own connection to that backend, forwards the request, streams the response back to the client, and closes the loop. The client never learns the backend's address, port, or even how many backends exist.

The word "reverse" is the only confusing part. A forward proxy sits in front of clients and represents them to the internet (think a corporate web filter). A reverse proxy sits in front of servers and represents them to clients. Same machine in the middle, opposite side it is hiding.

Because the proxy parses HTTP, it sits at Layer 7 of the OSI model and can make decisions on the request's content. A simpler Layer 4 proxy just shuttles TCP segments without reading them — faster, but blind to URLs and headers.

The four jobs it does at the edge

A reverse proxy earns its place by collapsing four cross-cutting concerns into one box, so the backends can stay simple:

TLS termination. The expensive asymmetric handshake (ECDHE key agreement plus a certificate signature) happens once, at the proxy. One certificate, one cipher policy, one place to enforce TLS 1.3 and HSTS. Backends speak plaintext HTTP on a trusted network — or get re-encrypted on the second hop if the network isn't trusted.
Load balancing. The proxy spreads requests across a pool of identical backends using round-robin, least-connections, or a hash of the client IP. Dead backends are pulled from rotation by active health checks (GET /healthz every few seconds) so traffic never lands on a crashed process.
Caching. Cacheable responses (per Cache-Control) are stored at the proxy, so identical requests are served from RAM in microseconds instead of regenerating them on a backend.
Topology hiding. Internal hostnames, ports, server software versions, and the count of machines stay invisible. The attack surface is one hardened endpoint, not N application servers exposed to the open internet.

The precise request path and its cost

A single proxied request walks a deterministic pipeline. The latency it adds is small but worth knowing precisely:

client ──TLS──▶ [ reverse proxy ]──plaintext──▶ backend
                 1. accept TCP, complete TLS handshake     (~1 RTT, amortized by keep-alive)
                 2. parse request line + headers           (~microseconds)
                 3. match a route / virtual host           (O(1) hash or O(log n) trie)
                 4. consult cache; HIT → return, MISS → 5
                 5. pick a healthy backend                 (O(1) round-robin / least-conn)
                 6. reuse a pooled upstream connection      (no new handshake)
                 7. stream request, stream response back
                 8. (optionally) store response in cache

The added latency of steps 2–7 on a warm path is typically 0.1–1 ms — header parsing and a memory lookup. The connection-pool reuse is the big win: without it, every request to the backend would pay a fresh TCP + (optionally) TLS handshake of 1–2 round trips, which on a 20 ms intra-datacenter link is 20–40 ms. With pooling, that cost is paid once per long-lived connection and amortized across thousands of requests.

When to put one in front

More than one backend. The moment you scale from one app server to two, you need something to spread traffic and survive a single failure.
Many services, one domain. Route /api to the API cluster, /static to a file server, /ws to a WebSocket service — all under one hostname, one certificate.
You want TLS, caching, compression, and rate limiting handled once rather than re-implemented in every service.
You need to hide a legacy or sensitive backend behind a hardened, observable, single endpoint.
Blue-green or canary deploys — shift a percentage of traffic to a new backend version by changing one weight at the proxy.

You can skip it for a single static site (a CDN already is a reverse proxy) or for a tiny internal tool with one server and no TLS. But almost any production HTTP system has at least one reverse proxy somewhere in the path — often several stacked.

Reverse proxy vs the things people confuse it with

	Reverse proxy	Forward proxy	L4 load balancer	API gateway	CDN
Sits in front of	Servers	Clients	Servers	Servers (APIs)	Servers (edge of)
Hides whom	The origin	The user	The origin	The origin	The origin
OSI layer	L7 (or L4)	L7	L4 (TCP/UDP)	L7	L7
Reads HTTP content	Yes	Yes	No — blind to URLs	Yes, deeply	Yes
TLS termination	Yes	Sometimes	Passthrough or yes	Yes	Yes
Caching	Optional	Optional	No	Optional	Core feature
Auth / rate limit / metering	Basic	Egress filtering	No	Core feature	Basic (WAF)
Geographic distribution	Single site	Single site	Single site	Single site	Hundreds of PoPs

The relationships are nested, not exclusive: an API gateway is a reverse proxy with API-management features bolted on; a CDN is a globally distributed reverse proxy tuned for caching; an L7 load balancer is a reverse proxy whose headline feature is balancing. The L4 load balancer is the odd one out — it forwards packets without ever parsing them, so it can't route by URL or terminate TLS at the application layer.

What the numbers actually say

TLS handshake cost: 1–4 ms of CPU per new connection. An ECDHE-RSA handshake with a 2048-bit key burns single-digit milliseconds; with session resumption (TLS 1.3 0-RTT / tickets) a returning client skips it entirely. Centralizing this at one proxy lets you buy hardware (or AES-NI / kernel TLS) once instead of on every backend.
Connection pooling saves a full RTT per request. On a 0.5 ms intra-rack link the handshake is cheap; across a 20 ms regional link, reusing a pooled upstream connection turns a 40 ms penalty into ~0 ms.
Cache hits are ~1000× cheaper than origin regeneration. Serving a cached 50 KB response from the proxy's memory is tens of microseconds; regenerating it on a backend that hits a database might be 50–500 ms. A 90% hit ratio cuts backend load to a tenth.
A single NGINX instance handles tens of thousands of concurrent connections on commodity hardware thanks to its event-driven (epoll) model — but it is still one process to size for peak concurrency, not just throughput.
The proxy hop is real but small. Budget 0.1–1 ms of added latency on the warm path; if you measure 10 ms, you have a misconfiguration (DNS lookups per request, no keep-alive, or buffering the whole body).

A minimal reverse proxy in JavaScript (Node)

The core of a reverse proxy is dumbfoundingly small: accept a request, pick a backend, re-issue it, pipe the response back. Here is a working round-robin L7 proxy in pure Node, no framework:

import http from 'node:http';

// Pool of identical backends.
const backends = [
  { host: '10.0.0.11', port: 8080, alive: true, inflight: 0 },
  { host: '10.0.0.12', port: 8080, alive: true, inflight: 0 },
  { host: '10.0.0.13', port: 8080, alive: true, inflight: 0 },
];
let rr = 0;

// Round-robin over the *alive* backends.
function pickBackend() {
  for (let i = 0; i < backends.length; i++) {
    const b = backends[(rr++) % backends.length];
    if (b.alive) return b;
  }
  return null; // all dead → 503
}

const proxy = http.createServer((clientReq, clientRes) => {
  const b = pickBackend();
  if (!b) { clientRes.writeHead(503); return clientRes.end('no healthy backend'); }

  b.inflight++;
  const headers = { ...clientReq.headers };
  // Preserve the client's real IP for the backend (append, don't overwrite).
  const prior = headers['x-forwarded-for'];
  const clientIp = clientReq.socket.remoteAddress;
  headers['x-forwarded-for'] = prior ? `${prior}, ${clientIp}` : clientIp;
  headers['x-forwarded-proto'] = 'https';      // we terminated TLS
  headers['host'] = clientReq.headers.host;     // pass the original Host through

  const upstream = http.request(
    { host: b.host, port: b.port, method: clientReq.method, path: clientReq.url, headers },
    (upRes) => {
      clientRes.writeHead(upRes.statusCode, upRes.headers);
      upRes.pipe(clientRes);                     // stream — never buffer the whole body
      upRes.on('end', () => { b.inflight--; });
    }
  );
  upstream.on('error', () => {
    b.alive = false;                             // eject on failure
    b.inflight--;
    if (!clientRes.headersSent) { clientRes.writeHead(502); clientRes.end('bad gateway'); }
  });
  clientReq.pipe(upstream);                       // stream the request body upstream
});

// Active health check: re-admit a backend when /healthz returns 200.
setInterval(() => {
  for (const b of backends) {
    http.get({ host: b.host, port: b.port, path: '/healthz', timeout: 1000 },
      (r) => { b.alive = r.statusCode === 200; r.resume(); })
      .on('error', () => { b.alive = false; });
  }
}, 3000);

proxy.listen(443, () => console.log('reverse proxy on :443'));

Two details that separate a toy from a real proxy. First, the body is streamed (pipe), never buffered — buffering a 2 GB upload would OOM the proxy. Second, the client's IP is appended to X-Forwarded-For rather than overwritten, so a chain of proxies preserves the full path. A production proxy adds connection pooling, timeouts on every phase, circuit breaking, and request-size limits.

The same idea as configuration (NGINX) and Python

In practice you almost never hand-write the proxy loop — you configure NGINX, Envoy, or HAProxy. The equivalent of the code above in NGINX is a few lines:

upstream app {
    least_conn;                       # send to the backend with fewest active conns
    server 10.0.0.11:8080 max_fails=3 fail_timeout=10s;
    server 10.0.0.12:8080 max_fails=3 fail_timeout=10s;
    server 10.0.0.13:8080 max_fails=3 fail_timeout=10s;
}

server {
    listen 443 ssl;
    server_name shop.example.com;
    ssl_certificate     /etc/ssl/shop.pem;     # TLS terminated here
    ssl_certificate_key /etc/ssl/shop.key;
    ssl_protocols TLSv1.2 TLSv1.3;

    location / {
        proxy_pass         http://app;          # forward to the upstream pool
        proxy_set_header   Host              $host;
        proxy_set_header   X-Real-IP         $remote_addr;
        proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;
        proxy_http_version 1.1;                 # keep-alive to the backend
        proxy_set_header   Connection "";       # enable upstream keep-alive
    }
}

And a compact async reverse proxy in Python, for when you want the logic in code (here with aiohttp):

import itertools
from aiohttp import web, ClientSession

BACKENDS = ["http://10.0.0.11:8080", "http://10.0.0.12:8080", "http://10.0.0.13:8080"]
rr = itertools.cycle(BACKENDS)            # simple round-robin

async def handle(request):
    target = next(rr) + request.rel_url.path_qs
    headers = dict(request.headers)
    # Append the client's IP; trust the existing chain only from known proxies.
    xff = headers.get("X-Forwarded-For")
    peer = request.remote
    headers["X-Forwarded-For"] = f"{xff}, {peer}" if xff else peer
    headers["X-Forwarded-Proto"] = "https"

    session: ClientSession = request.app["session"]
    body = await request.read()           # small bodies only; stream for large
    try:
        async with session.request(request.method, target, headers=headers,
                                   data=body, allow_redirects=False) as upstream:
            resp = web.StreamResponse(status=upstream.status, headers=upstream.headers)
            await resp.prepare(request)
            async for chunk in upstream.content.iter_chunked(64 * 1024):
                await resp.write(chunk)   # stream the response back
            await resp.write_eof()
            return resp
    except Exception:
        return web.Response(status=502, text="bad gateway")

async def make_app():
    app = web.Application()
    app["session"] = ClientSession()
    app.router.add_route("*", "/{tail:.*}", handle)
    return app

web.run_app(make_app(), port=443)

Variants and the stack they form

L4 vs L7. An L4 proxy (HAProxy in TCP mode, AWS NLB) forwards segments without reading them — lower latency, protocol-agnostic, but can't route by URL or terminate application TLS. An L7 proxy reads the HTTP request and routes on it. Many deployments stack an L4 balancer in front of several L7 proxies.

API gateway. A reverse proxy specialized for APIs: per-route authentication, API keys, quota and rate limiting, request/response transformation, and usage metering. Kong, Apigee, AWS API Gateway, and Envoy-based meshes are gateways.

Service mesh sidecar. In Kubernetes, a tiny reverse proxy (Envoy under Istio, or Linkerd's own Rust micro-proxy) runs next to every pod, intercepting all in/out traffic to provide mutual TLS, retries, and per-service observability — a reverse proxy turned inside-out across the whole cluster.

CDN. A reverse proxy replicated to hundreds of points of presence worldwide, optimized for caching static and cacheable dynamic content close to users. See the dedicated CDN explainer.

Ingress controller. The Kubernetes object that programs a reverse proxy (often NGINX or Envoy) from declarative Ingress rules — the cluster's front door.

Common bugs and edge cases

Trusting X-Forwarded-For blindly. If the backend reads the leftmost X-Forwarded-For value from any source, a client can forge its IP, bypass rate limits, or poison logs. Trust the header only when it arrives from your proxy's IP, and parse from the right.
Buffering large bodies. Reading an entire upload or download into memory before forwarding can OOM the proxy under load. Always stream; cap request size.
The proxy as a single point of failure. One proxy in the path means one outage takes down healthy backends too. Run at least two behind a VIP (keepalived/VRRP) with fast health-check failover.
WebSocket and HTTP/2 upgrades dropped. A proxy configured only for HTTP/1.0 request/response will fail the Connection: Upgrade handshake. WebSockets need explicit upgrade handling and long read timeouts.
Mismatched timeouts. If the proxy's upstream read timeout is shorter than the backend's processing time, slow-but-valid requests get 504s. Tune connect, send, and read timeouts per route.
Forgetting the Host header. Name-based virtual hosting on the backend breaks if the proxy forwards its own hostname instead of the original Host. Always pass Host: $host through.
Caching authenticated responses. Caching a response that varied by cookie or auth header without a correct Vary can leak one user's data to another. Mark private responses Cache-Control: private, no-store.

Frequently asked questions

What is the difference between a reverse proxy and a forward proxy?

A forward proxy sits in front of clients and hides them from the internet — it acts on behalf of the user (a corporate web filter, a VPN egress). A reverse proxy sits in front of servers and hides them from clients — it acts on behalf of the origin. The traffic direction is the same, but who it shields is reversed: forward proxy protects the requester, reverse proxy protects the responder.

Is a reverse proxy the same as a load balancer?

Load balancing is one feature a reverse proxy can do, not a synonym. A reverse proxy operates at L7 (HTTP) and can also terminate TLS, cache, rewrite headers, and route by URL path. A pure load balancer can be L4 (TCP/UDP), just forwarding packets to a backend without parsing them. Every L7 load balancer is a reverse proxy; not every load balancer is L7.

Why terminate TLS at the reverse proxy instead of the backend?

Centralizing TLS at the proxy means you manage one certificate and one cipher policy instead of N. The expensive RSA/ECDHE handshake — a few milliseconds of CPU each — happens once at the edge, and the proxy reuses keep-alive connections to the backends. You also get a single place to enforce TLS 1.3, HSTS, and OCSP stapling. The trade-off is the proxy-to-backend hop is now plaintext unless you re-encrypt.

How does a reverse proxy know the client's real IP after it rewrites the connection?

Because the backend sees the proxy's IP as the source, the proxy injects the original client address into an X-Forwarded-For header (or the standardized Forwarded header from RFC 7239, or the PROXY protocol for L4). The backend must be configured to trust that header only from the proxy's IP — otherwise a client can spoof X-Forwarded-For and forge its apparent origin.

What is the difference between a reverse proxy and an API gateway?

An API gateway is a reverse proxy with API-specific features layered on: per-route authentication, rate limiting, request/response transformation, API key management, and usage metering. Under the hood it is still a reverse proxy doing TLS termination and routing — the gateway is the application-aware superset built for managing many microservice APIs.

Can a reverse proxy become a single point of failure?

Yes. If every request funnels through one proxy and it dies, the whole site is down even when all backends are healthy. Production deployments run at least two proxy instances behind a virtual IP (keepalived/VRRP) or an L4 load balancer, with health checks failing over in seconds. The proxy must be sized for peak connection counts, not just bandwidth.