Networking
TCP BBR Congestion Control
Stop guessing from packet loss — measure the pipe and pace to it
TCP BBR is a model-based congestion control algorithm that probes for the bottleneck bandwidth and minimum round-trip time directly, pacing data to the bandwidth-delay product instead of treating packet loss as the signal for congestion.
- Full nameBottleneck Bandwidth and RTT
- OriginGoogle, 2016
- Congestion signalRate + RTT model, not loss
- Target inflight≈ 1 × BDP
- Linux kernel since4.9 (pluggable)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The problem BBR solves
For thirty years, TCP treated a dropped packet as the definition of congestion. Reno, then CUBIC — the default in Linux since 2006 — grow the congestion window until the network loses a packet, then cut the window and start climbing again. This is the famous sawtooth. It worked beautifully on the slow, shallow-buffered links of the 1980s, where a drop really did mean a buffer had just overflowed.
Two things broke that assumption. First, deep buffers: modern routers and home gateways ship with megabytes of buffer, so a loss-based sender keeps the queue full for hundreds of milliseconds before a packet ever drops — this is bufferbloat, and it adds enormous latency to every flow sharing the link. Second, long fat networks: on a 10 Gbps transcontinental path, a single random bit error that drops 1 in 10 million packets is enough to keep CUBIC from ever reaching full speed, because each loss triggers a multiplicative backoff.
BBR — Bottleneck Bandwidth and Round-trip propagation time — was published by Neal Cardwell, Yuchung Cheng, and colleagues at Google in 2016 (ACM Queue, "BBR: Congestion-Based Congestion Control"). Its thesis is that loss is a lagging, noisy proxy. Instead, build an explicit model of the path — how fast is the bottleneck link, and what is its base latency? — and send at exactly that rate.
Kleinrock's optimal point and the BBR model
BBR is built on a result Leonard Kleinrock proved in 1979: a connection runs at maximum throughput and minimum delay simultaneously when the amount of data in flight equals the bandwidth-delay product (BDP). Below that, you leave the pipe partly empty; above it, the excess just sits in a queue, adding latency without adding throughput.
So BBR continuously estimates two parameters:
- BtlBw (bottleneck bandwidth) — the maximum delivery rate observed over a sliding window of about 10 round trips. Delivery rate for any ACK is
Δ delivered_bytes / Δ time, measured between when a packet was sent and when its ACK arrives. - RTprop (round-trip propagation delay) — the minimum RTT observed over a sliding window of about 10 seconds. The minimum is used because any RTT above the floor is queuing, not propagation.
The model's key quantity is then:
BDP = BtlBw × RTprop // bytes that fill the pipe, zero queue
target_cwnd ≈ cwnd_gain × BDP // BBR caps inflight near 2 × BDP as a safety bound
pacing_rate = pacing_gain × BtlBw // the rate packets actually leave the sender
Crucially, BBR paces: rather than dumping a whole window of packets and letting them burst into the bottleneck queue, it spaces each packet out in time at pacing_rate. The congestion window becomes a safety cap, not the primary control knob — the pacing rate is.
Why the two measurements fight each other
There is an inherent tension. To measure maximum bandwidth you must keep the pipe full — which builds a queue and inflates RTT. To measure the true minimum RTT you must drain that queue — which means temporarily sending below capacity. You cannot do both at once. BBR resolves this by alternating, spending the overwhelming majority of its time in a bandwidth-probing state and dipping briefly to re-measure RTT.
The steady state is ProbeBW, which cycles pacing_gain through a fixed eight-phase pattern, one phase per RTprop:
pacing_gain phases (BBRv1): [1.25, 0.75, 1, 1, 1, 1, 1, 1]
↑ ↑
probe up drain the queue the probe just made
The 1.25 phase sends 25% faster to test whether more bandwidth is available; if the delivery rate rises, BtlBw updates. The 0.75 phase immediately sends 25% slower to drain whatever queue the probe created, keeping the standing queue near zero on average. The other phases cruise at the estimated rate.
Roughly every 10 seconds, if no new RTprop minimum has been seen, BBR enters ProbeRTT: it cuts inflight data to about four packets for at least 200 ms, draining the bottleneck queue so it can read the genuine propagation delay. Coordinated across flows, these synchronized drains let everyone re-measure a clean RTprop together. The full state machine is Startup → Drain → ProbeBW ⇄ ProbeRTT.
- Startup doubles the rate each RTT (
pacing_gain = 2/ln2 ≈ 2.89) — an exponential ramp like slow start — and exits when delivery rate plateaus for three RTTs. - Drain uses the inverse gain to flush the queue Startup built.
- ProbeBW is the steady-state cycle above.
- ProbeRTT is the periodic queue-draining dip.
When to choose BBR
- Long fat networks — transcontinental, satellite, or cellular paths with high bandwidth-delay product, where loss-based control can't fill the pipe.
- Links with non-congestive loss — Wi-Fi, LTE/5G, and lossy last miles, where CUBIC backs off on errors that aren't congestion.
- Deep-buffered paths where latency matters — CDN egress, video streaming, RPC backends. BBR keeps queues short, so interactive traffic on the same link stays responsive.
- Bulk transfers from a controlled sender — you only need BBR on the sending side, so a CDN or cloud egress can deploy it unilaterally.
Be cautious when sharing a shallow-buffered bottleneck with many CUBIC flows, or when strict fairness with legacy TCP is a hard requirement — BBRv1 can be aggressive there. Prefer BBRv2/v3 in those settings.
BBR vs loss-based and delay-based control
| BBR | CUBIC | Reno / NewReno | Vegas | BBRv2 / v3 | |
|---|---|---|---|---|---|
| Congestion signal | Rate + min-RTT model | Packet loss | Packet loss | RTT increase (delay) | Model + loss + ECN |
| Window growth | Paced to BtlBw | Cubic function of time | Additive increase | Adjusts to keep small queue | Paced, inflight-capped |
| Standing queue | ≈ 0 (targets 1 BDP) | Fills the buffer | Fills the buffer | Small | ≈ 0, bounded |
| Reaction to random loss | Ignores it | Halves window (multiplicative) | Halves window | Still backs off on loss | Bounded loss response |
| Bufferbloat | Avoids it | Causes it | Causes it | Avoids it | Avoids it |
| Fairness with CUBIC | Often unfair (v1) | — | Loses to BBR/CUBIC | Starved by loss-based flows | Improved, still imperfect |
| Real-world use | Google, YouTube, QUIC | Linux default since 2006 | Historical / teaching | Research, niche | Rolling out on Google traffic |
The key axis is the signal. Loss-based (CUBIC, Reno) waits for damage and overfills buffers. Pure delay-based (Vegas) reads queuing early but gets starved when it shares a link with loss-based flows — it backs off while they keep pushing. BBR is neither: it builds an explicit throughput-and-latency model and paces to Kleinrock's optimal point, which is why it can sit at high throughput and low latency at the same time.
What the numbers actually say
- Google's WAN gains. When Google switched its B4 internal WAN and public-facing traffic from CUBIC to BBR (2016–2017), it reported throughput improvements of 2–25× on some paths, with the largest wins on long, lossy links. YouTube median RTT on affected connections dropped, and the published case studies cited median-latency reductions on the order of tens of milliseconds.
- Loss tolerance. On a 10 Gbps, 100 ms RTT path, CUBIC's throughput collapses once packet loss exceeds roughly 0.1%; at 1% loss it manages only a few hundred Mbps. BBR holds near line rate up to ~15–20% loss before the loss starts cutting into its bandwidth estimate — orders of magnitude more tolerant.
- Bufferbloat. On a path with a 250 ms dumb buffer, CUBIC drives queuing delay toward the full buffer (hundreds of ms added latency under load), while BBR holds the path near its base RTprop, typically within tens of milliseconds.
- Cost of the model. BBR's per-ACK work is O(1): update two windowed-max/min filters and recompute a pacing rate. The state is a handful of values per connection — no per-packet history needed beyond the sliding-window filters (about 3 samples each).
- The fairness caveat. Independent studies (e.g., the 2017 Hock et al. measurements) found a single BBRv1 flow can occupy 40%+ of a shallow-buffer link even against many CUBIC flows, and that two BBR flows can each hold up to ~1.5 BDP inflight in deep buffers — the motivation for the v2 inflight cap.
JavaScript: the core BBR update loop
This is a teaching model of BBR's per-ACK math — the windowed-max bandwidth filter, the windowed-min RTT filter, and the ProbeBW pacing-gain cycle. It is not production code (no Startup/Drain/ProbeRTT state machine), but it shows the engine.
// Windowed filters: max-bandwidth over RTT_WINDOW, min-RTT over RTPROP_WINDOW.
class WindowedMax {
constructor(windowMs) { this.windowMs = windowMs; this.samples = []; }
update(value, nowMs) {
this.samples.push({ value, t: nowMs });
// drop samples older than the window
while (this.samples.length && nowMs - this.samples[0].t > this.windowMs)
this.samples.shift();
return this.samples.reduce((m, s) => Math.max(m, s.value), -Infinity);
}
}
class WindowedMin {
constructor(windowMs) { this.windowMs = windowMs; this.samples = []; }
update(value, nowMs) {
this.samples.push({ value, t: nowMs });
while (this.samples.length && nowMs - this.samples[0].t > this.windowMs)
this.samples.shift();
return this.samples.reduce((m, s) => Math.min(m, s.value), Infinity);
}
}
const PACING_GAIN = [1.25, 0.75, 1, 1, 1, 1, 1, 1]; // ProbeBW cycle
class BBR {
constructor() {
this.btlBwFilter = new WindowedMax(10 * /* RTTs ≈ */ 200); // ~10 RTT window (ms)
this.rtPropFilter = new WindowedMin(10_000); // 10 s window (ms)
this.btlBw = 0; // bytes per ms
this.rtProp = Infinity; // ms
this.cycleIdx = 0;
}
// Called once per ACK. `deliveredBytes` and `intervalMs` come from rate sampling.
onAck({ deliveredBytes, intervalMs, rttMs }, nowMs) {
if (intervalMs > 0) {
const rate = deliveredBytes / intervalMs; // delivery rate this sample
this.btlBw = this.btlBwFilter.update(rate, nowMs);
}
this.rtProp = this.rtPropFilter.update(rttMs, nowMs);
}
// Advance the gain cycle once per RTprop and return the next pacing rate.
nextPacingRate() {
const gain = PACING_GAIN[this.cycleIdx];
this.cycleIdx = (this.cycleIdx + 1) % PACING_GAIN.length;
return gain * this.btlBw; // bytes per ms
}
// Inflight cap: cwnd_gain · BDP keeps a ceiling even if ACKs are bursty.
targetCwnd(cwndGain = 2) {
const bdp = this.btlBw * this.rtProp; // bytes
return Math.max(cwndGain * bdp, 4 * 1500); // never below ~4 packets
}
}
The two details that make this BBR rather than a generic rate controller: bandwidth uses a max filter (you want the best rate you've recently achieved, since lower samples just mean you were under-sending), while RTT uses a min filter (anything above the floor is queue, not path). Mixing those up — a min on bandwidth or a max on RTT — silently destroys the model.
Python: simulating the ProbeBW cycle
A compact event-driven sketch showing how the gain cycle keeps the standing queue near zero: the 1.25 probe builds a little queue, the 0.75 drain removes it.
from collections import deque
PACING_GAIN = [1.25, 0.75, 1, 1, 1, 1, 1, 1]
STARTUP_GAIN = 2 / 0.6931 # 2 / ln(2) ≈ 2.885
class WindowedExtreme:
"""Sliding-window max (op=max) or min (op=min) over a time window."""
def __init__(self, window, op):
self.window, self.op = window, op
self.samples = deque() # (value, time)
def update(self, value, now):
self.samples.append((value, now))
while self.samples and now - self.samples[0][1] > self.window:
self.samples.popleft()
return self.op(v for v, _ in self.samples)
class BBR:
def __init__(self, rtt_window=2.0, rtprop_window=10.0):
self.btlbw_filt = WindowedExtreme(rtt_window, max)
self.rtprop_filt = WindowedExtreme(rtprop_window, min)
self.btlbw = 0.0 # bytes/sec
self.rtprop = float('inf') # sec
self.idx = 0
def on_ack(self, delivered_bytes, interval, rtt, now):
if interval > 0:
rate = delivered_bytes / interval
self.btlbw = self.btlbw_filt.update(rate, now)
self.rtprop = self.rtprop_filt.update(rtt, now)
def pacing_rate(self):
gain = PACING_GAIN[self.idx]
self.idx = (self.idx + 1) % len(PACING_GAIN)
return gain * self.btlbw # bytes/sec
def bdp(self):
return self.btlbw * self.rtprop # bytes "in the pipe"
# Toy run: bottleneck 12.5 MB/s (100 Mbps), base RTT 40 ms.
bbr = BBR()
BW, RTPROP = 12.5e6, 0.040
t = 0.0
for step in range(16):
# pretend each RTT we measure the true link and report it
bbr.on_ack(delivered_bytes=BW * RTPROP, interval=RTPROP, rtt=RTPROP, now=t)
rate = bbr.pacing_rate()
queue = max(0.0, (rate - BW) * RTPROP) # bytes briefly queued this RTT
print(f"t={t:5.3f}s pacing={rate/1e6:5.1f} MB/s queue={queue:6.0f} B")
t += RTPROP
Run it and you'll see the queue spike only on the 1.25 phase and immediately fall back to zero on the 0.75 phase — the standing queue averages out to roughly nothing, which is the whole point. BDP here is 12.5 MB/s × 0.04 s = 500 KB, about 333 full-size packets in flight.
Variants worth knowing
BBRv2. The big rework (2019) that addressed v1's fairness and shallow-buffer problems. It adds an explicit loss bound and ECN response, and an inflight_hi cap so a BBR flow won't keep ballooning inflight in a deep buffer. It coexists far better with CUBIC and Reno, at a small throughput cost on perfectly clean links.
BBRv3. The 2023 iteration Google now runs in production, folding in v2's lessons with bug fixes around the ProbeBW phases and the loss/ECN logic; it's the version converging toward IETF standardization.
TCP Vegas. The intellectual ancestor — a 1990s delay-based scheme that watches RTT and backs off before loss. It's the cautionary tale: pure delay-sensing is too polite and gets starved by loss-based flows. BBR borrows the "watch the RTT" instinct but keeps probing rather than retreating.
CUBIC. Not a BBR variant but its main competitor and the default it usually replaces. CUBIC's window grows as a cubic function of time-since-last-loss, which makes it scale better than Reno on fast links — but it still uses loss as the signal and still fills buffers.
Copa and PCC. Other "rethink the signal" schemes. Copa targets a delay objective with a tunable competitiveness knob; PCC (Performance-oriented Congestion Control) runs online learning, micro-experimenting with rates and keeping whatever empirically maximizes a utility function.
Common bugs and misconceptions
- "BBR ignores loss entirely." Not quite. BBRv1 doesn't use loss as the primary signal, but extreme loss still erodes the delivery-rate samples that feed BtlBw, and v2/v3 add an explicit bounded loss response. The accurate statement is "loss is not the control signal."
- Forgetting that pacing, not cwnd, is the knob. BBR needs a working packet pacer (fair-queue qdisc
fqin Linux, or TCP internal pacing). Without pacing, a window's worth of packets bursts into the bottleneck and the model's queue estimate is wrong. - Using a max filter for RTT or a min filter for bandwidth. The asymmetry is load-bearing: max for bandwidth (you achieved at least this rate), min for RTprop (anything higher is queue). Swapping them produces a model that chases queues.
- ProbeRTT starving throughput. Cutting to ~4 packets every 10 s costs a brief throughput dip. On very high-BDP paths that dip is visible; it's a known tradeoff, and v2/v3 tuned how aggressively the inflight drops.
- Expecting fairness against CUBIC out of the box. A BBRv1 flow can crowd out CUBIC in shallow buffers or hog deep ones. If you deploy BBR on a shared bottleneck with legacy senders, use v2/v3 and measure the split.
- Deploying it on the receiver. Congestion control governs the sender. Setting BBR only on the side that downloads does nothing; it must be on whichever endpoint is pushing the bulk data.
Frequently asked questions
How is BBR different from loss-based congestion control like CUBIC?
CUBIC grows its window until a packet is dropped, then halves it — so it deliberately fills router buffers and uses loss as the congestion signal. BBR ignores loss as a primary signal. It measures the bottleneck bandwidth and the minimum round-trip time directly, then paces sending at exactly that bandwidth with about one BDP in flight, so it neither overfills buffers nor backs off on random non-congestive loss.
What does BBR actually measure?
Two things: BtlBw, the bottleneck bandwidth, taken as the maximum delivery rate seen over the last ~10 round trips; and RTprop, the round-trip propagation delay, taken as the minimum RTT seen over the last ~10 seconds. Their product, BtlBw × RTprop, is the bandwidth-delay product — the amount of data that fills the pipe without queuing.
Why can't BBR measure bandwidth and minimum RTT at the same time?
To measure max bandwidth you must keep the pipe full, which creates a queue and inflates RTT. To measure the true minimum RTT you must drain that queue, which means sending below capacity. The two probes are mutually exclusive, so BBR alternates: it spends most of its time probing bandwidth, then briefly drops the inflight data to roughly four packets every ~10 seconds to re-measure RTprop.
Is BBR unfair to CUBIC flows?
It can be. In deep-buffered links BBRv1 often grabs more than its fair share because it doesn't retreat on loss the way CUBIC does, and in shallow buffers it can cause high retransmission rates. BBRv2 and v3 added explicit loss and ECN response plus an inflight cap to share more fairly with Reno/CUBIC, but cross-flow fairness remains the algorithm's hardest open problem.
Does BBR help with bufferbloat?
Yes — that is its headline win. Because BBR targets one bandwidth-delay product in flight rather than filling the buffer, it keeps standing queues small. On a path with a large dumb buffer, CUBIC can add hundreds of milliseconds of queuing latency, while BBR holds the same path near its base RTT, dramatically lowering latency under load.
Where is BBR deployed in the real world?
Google deploys BBR on google.com and YouTube, and it is the congestion control for QUIC across much of Google's traffic. It shipped in the Linux kernel in 4.9 (2016) as a pluggable module and is widely used on CDN and cloud egress where high-throughput, low-latency transfers over long fat networks matter.