Networking
Border Gateway Protocol (BGP)
Routes 850,000+ IP prefixes across 75,000+ ASes — the protocol holding the Internet together
BGP is the path-vector routing protocol that exchanges routing information between autonomous systems (ASes) — the ~75,000 networks (ISPs, enterprises, content providers) that compose the Internet. Each AS announces which IP prefixes it can deliver, attaching the AS path it took to reach them. The current Internet routing table contains ~950,000 IPv4 prefixes (April 2026, MRT data) and 200,000+ IPv6 prefixes. BGP-4 was standardized in 1995 (RFC 1771, now 4271). Famous for routing leaks (AS7007 1997, Pakistan-YouTube 2008, Facebook 2021 outage where BGP withdrawals took down access for 6 hours), and for being the routing layer attacked in BGP hijacks. RPKI (resource public key infrastructure) and BGPsec are post-fact security overlays.
- StandardRFC 4271 (BGP-4, 2006)
- IPv4 prefixes~950,000 (2026)
- ASes~75,000
- Path-vectorper-AS path tracking
- TCP port179
- Security overlayRPKI, BGPsec
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
What BGP actually does
The Internet is not one network. It is roughly 75,000 separately-operated networks called autonomous systems (AS), each identified by a 16- or 32-bit AS number (ASN). Comcast is AS7922. Cloudflare is AS13335. Your university might be AS65001. To deliver a packet from AS1 to AS2, the packet must traverse a chain of ASes that carry it for one another — and each AS along the way must know where each IP prefix lives.
BGP is the protocol that distributes this knowledge. Every BGP-speaking router maintains a TCP session (port 179) with a configured neighbor. Over that session it exchanges UPDATE messages saying "I can deliver prefix X via this AS path." Each router stores all received routes in its RIB (Routing Information Base), runs a deterministic best-path selection, and installs the chosen route into its forwarding table.
The BGP best-path selection algorithm
When multiple routes for the same prefix arrive, BGP picks one according to a strict tiebreaker chain. The exact list varies slightly between vendors but is usually:
| Step | Tiebreaker | Wins if |
|---|---|---|
| 1 | Highest WEIGHT (Cisco-only, local) | Set on this router |
| 2 | Highest LOCAL_PREF (within AS) | Operator policy |
| 3 | Locally-originated route | Configured here |
| 4 | Shortest AS_PATH | Fewer hops |
| 5 | Lowest ORIGIN code | IGP < EGP < INCOMPLETE |
| 6 | Lowest MED | Inbound preference from peer |
| 7 | eBGP over iBGP | External preferred |
| 8 | Lowest IGP cost to next-hop | Closest exit |
| 9 | Lowest router ID | Final tiebreak |
Operators rarely touch steps 4-9. The real levers are LOCAL_PREF (set high to prefer a customer over a peer over an upstream — the "customer > peer > transit" rule) and MED (signal a preferred entry point to a peer).
eBGP vs iBGP
BGP comes in two flavors that differ only by the relationship between the two routers:
- eBGP (external). Between routers in different ASes. TTL 1 by default — a single hop. Default LOCAL_PREF 100. AS_PATH gets prepended with the local ASN on send.
- iBGP (internal). Between routers inside the same AS — for example, between two border routers that both speak eBGP outward and need to share what they learned. AS_PATH is not prepended (loop prevention works differently). iBGP requires a full mesh or a route reflector / confederation, because routes learned via iBGP are not re-advertised to other iBGP peers.
The four BGP message types
| Message | Purpose | Frequency |
|---|---|---|
| OPEN | Negotiate version, ASN, hold time, capabilities | Once per session |
| UPDATE | Announce or withdraw prefixes | Per route change |
| KEEPALIVE | Confirm session is alive | Every hold-time / 3 |
| NOTIFICATION | Error, terminating session | On error |
Default hold time is 180 seconds; KEEPALIVE every 60 seconds. Drop three keepalives in a row and the session resets — every prefix learned via that session is withdrawn from the RIB and the best-path computation re-runs for every affected route.
Why BGP convergence is slow
When a prefix is withdrawn somewhere on the Internet, the news propagates surprisingly slowly. Three knobs add latency:
- MRAI (Minimum Route Advertisement Interval). RFC 4271 recommends 30 seconds between successive UPDATEs about the same prefix to a peer. Damps oscillation; raises floor on convergence.
- Path exploration. When the primary path disappears, routers try every alternative AS path they have heard of before declaring the prefix unreachable. In the worst case this walks a factorial number of paths and the prefix flutters in and out of the table for tens of seconds.
- Route flap damping. If a prefix flaps repeatedly (announce / withdraw / announce …), some operators suppress it for hours under RFC 7196 damping rules. Designed to protect against unstable customers; can also delay legitimate failover.
Empirically, RIPE NCC's RIS measurement shows median full-Internet convergence after a single withdrawal at 30 seconds to 3 minutes. During that window, some networks still have the dead route; some have a working alternative; some have nothing.
Famous BGP incidents
| Year | Incident | Effect |
|---|---|---|
| 1997 | AS7007 incident — small ISP misconfigured, announced /24s of the entire Internet | Many of the world's BGP routers picked the more-specific /24, traffic collapsed for hours |
| 2008 | Pakistan Telecom announced YouTube's /24 to block YouTube domestically; leaked the announcement upstream | YouTube globally unreachable for ~2 hours |
| 2017 | Google leaked Verizon-to-Verizon NTT routes, dropping Japan's traffic | ~40 min of Japan partial outage |
| 2018 | BGP hijack of Amazon Route 53 redirected MyEtherWallet users; ~$150,000 stolen | Demonstrated criminality of BGP attacks |
| 2021 | Facebook withdrew its own DNS prefix during maintenance; could not reach DCs to revert | ~6 hours, $60M+ revenue impact |
| 2022 | KlaySwap DeFi hijack via Korean ISP, ~$2M stolen | Crypto wallet rerouting via false BGP announcements |
RPKI and BGPsec — the security overlay
BGP itself is unauthenticated. The two main defenses bolted on after the fact:
- RPKI (RFC 6480). Each Regional Internet Registry (ARIN, RIPE, APNIC, LACNIC, AFRINIC) maintains a cryptographic certificate authority. When ARIN allocates a prefix to an operator, the operator can publish a Route Origin Authorization (ROA) saying "AS X is authorized to originate prefix P at max length L." Routers running RPKI validation look up each prefix's ROAs; an UPDATE whose origin ASN does not match becomes invalid and is dropped. As of 2026, about 50% of IPv4 prefixes have ROAs and major transit providers (Cogent, NTT, Tata, Telia, AT&T) drop invalids.
- BGPsec (RFC 8205). Goes further: every router signs each AS_PATH segment, so the entire path is cryptographically verifiable. Adoption is essentially zero — too costly in CPU and key management for marginal gain over RPKI plus path-of-origin validation.
- ASPA (Autonomous System Provider Authorization). Newer overlay (draft, 2024+) that authorizes the customer-provider relationships, catching route leaks where the origin is correct but the path passes through the wrong upstream.
Inspecting BGP from the command line
# What's the AS path to 8.8.8.8 from my vantage?
mtr --aslookup 8.8.8.8
# Free public looking glass — see the global table
curl 'https://lg.he.net/?ip=1.1.1.1'
# Query RIPE RIS for current AS_PATH
whois -h whois.radb.net -- '-i origin AS13335' | head
# Cisco IOS
show ip bgp summary
show ip bgp 1.1.1.0/24
show ip bgp neighbors 192.0.2.1
# FRRouting / Quagga / BIRD
vtysh -c 'show ip bgp'
birdc 'show route protocol ebgp_provider'
# Validate RPKI status of a prefix
whois -h rpki-validator.ripe.net -- '8.8.8.0/24'
Why BGP matters
- Internet backbone. Without BGP no AS could reach any other AS. There is no fallback protocol.
- Multi-homing. An enterprise with two upstream ISPs uses BGP to advertise the same prefix to both, gaining failover and traffic engineering.
- Traffic engineering. AS_PATH prepending, MED, communities, selective announcements all let operators steer inbound and outbound traffic by ISP, by region, by customer.
- Peering vs transit economics. Settlement-free peering at IXPs costs ~$1k/month for a port; transit costs ~$0.50-$2/Mbps. CDNs (Cloudflare, Akamai, Netflix) peer aggressively to push transit costs near zero.
- Anycast. Critical infrastructure (DNS roots, public resolvers, CDN edges) announces the same prefix from many locations; BGP routes each user to the closest.
- DDoS mitigation via BGP blackholing. Trigger an emergency announcement that null-routes attack traffic upstream. RFC 7999 standardizes a community for this.
Common misconceptions
- "BGP picks the shortest path." AS_PATH length is one of nine tiebreakers, and most operators override it with LOCAL_PREF for policy. The actual selection is "policy first, length later."
- "BGP is secure by default." BGP authenticates nothing. Anyone with a peering session can announce anything; only RPKI ROAs (an opt-in overlay) and operator filtering catch hijacks.
- "Convergence is fast." Inside a data center IGP converges in seconds; full-Internet BGP convergence takes minutes due to MRAI, path exploration, and damping.
- "Bigger AS_PATH is always worse." Operators routinely prepend their own ASN multiple times to de-prefer a path. Prepending is a tool, not pure metric.
- "All ASes are equal." About 30 Tier-1 ASes form the default-free zone and reach the entire Internet without buying transit; the other 75,000 are customers somewhere up the chain.
- "IPv6 BGP is the same as IPv4." The protocol is multi-protocol BGP (RFC 4760); IPv6 uses MP_REACH_NLRI in the same UPDATE format. Operationally close enough that the same daemons handle both.
- "BGP is for ISPs only." Cloud providers, large enterprises, exchanges, anycast deployments, even some home routers (with /29 from RIPE) speak BGP. Anyone with an ASN does.
Frequently asked questions
Why is BGP path-vector and not link-state?
Link-state protocols like OSPF flood the full topology to every router and recompute shortest paths with Dijkstra. That works inside a single organization with hundreds of routers but cannot scale to 75,000 autonomous systems with diverging trust and policy. BGP carries only the AS-path attribute — a list of which ASes a route traversed — and lets each operator apply local policy (prefer this peer, avoid that transit, never advertise this prefix here). Path-vector also makes loop detection trivial: if your own AS number is already in the path, drop the announcement.
What is an AS path and how does it prevent loops?
Every BGP UPDATE carries an AS_PATH attribute listing the autonomous systems traversed, prepended as the route propagates. A route announced from AS65001 to AS65002 to AS65003 carries AS_PATH = [65003, 65002, 65001]. When the announcement reaches a router whose own ASN is already in the path, BGP drops it — that prevents loops without any global view of topology. The path also doubles as a tiebreaker: shorter paths win, all else equal.
What was the Facebook 2021 BGP outage?
On October 4, 2021, a routine maintenance command on Facebook's backbone caused all of Facebook's authoritative DNS servers to stop announcing their IP prefixes via BGP. Without BGP announcements, the rest of the Internet had no route to facebook.com, instagram.com, whatsapp.com — all DNS lookups failed at the resolver level. Worse, internal tools, badge-access systems, and remote login also depended on the same DNS, so engineers physically had to reach the data center to revert the change. Outage lasted about 6 hours.
What is a BGP hijack and how does RPKI prevent it?
A BGP hijack is when an AS announces a prefix it does not own — either by accident (Pakistan announced YouTube's /24 in 2008) or maliciously (cryptocurrency hijacks reroute payments). BGP itself trusts every announcement. RPKI (Resource Public Key Infrastructure, RFC 6480) attaches cryptographic Route Origin Authorizations (ROAs) signed by the regional Internet registry that allocated the prefix. Routers validating RPKI will reject announcements whose origin AS does not match any ROA. As of 2026, ~50% of IPv4 prefixes have valid ROAs and major networks (Cloudflare, AT&T, Telia) drop invalid routes.
How is BGP different from interior protocols (OSPF, IS-IS)?
Interior gateway protocols (IGPs) like OSPF and IS-IS run inside one AS, optimizing shortest paths inside a trust boundary. They flood link-state updates and recompute on changes within seconds. BGP is an exterior gateway protocol (EGP); it runs between ASes, where neighbors are not trusted, paths are chosen by policy not metrics, and convergence is intentionally slow to dampen instability. A typical large operator runs IS-IS or OSPF inside its backbone and BGP at every customer and peering edge.
Why does BGP take minutes to converge globally?
Three reasons. First, MRAI (minimum route advertisement interval) timers — typically 30 seconds for eBGP — deliberately throttle update bursts. Second, path exploration: when a route disappears, neighbors try every alternate path before giving up, walking a combinatorial set of AS paths. Third, sheer scale — an UPDATE has to propagate through tens of thousands of ASes, each running its own policy filters. RIPE measurements show median global convergence of 30 seconds to 3 minutes after a single prefix withdrawal.
What is the difference between peering and transit?
A transit relationship is one AS paying another to carry its traffic to the rest of the Internet — the upstream announces full table (~950k routes) to the customer, and the customer announces its own prefixes to the upstream. A peering relationship is two ASes exchanging only their own and their customers' routes, settlement-free, usually at an Internet exchange point (IXP) like AMS-IX or DE-CIX. Cloudflare and Netflix peer with thousands of networks to keep traffic off paid transit; small ISPs typically buy transit from one or two upstreams.