Networking

OSPF (Open Shortest Path First)

Every router builds the same map — then runs Dijkstra on it

OSPF is a link-state interior gateway protocol where routers flood link-state advertisements, build an identical map of the network, and run Dijkstra's algorithm to compute the shortest path to every destination.

  • TypeLink-state IGP
  • Path algorithmDijkstra (SPF)
  • SPF complexityO((V + E) log V)
  • Protocol numberIP 89
  • StandardRFC 2328 (OSPFv2)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How OSPF builds a map and finds shortest paths

A distance-vector protocol like RIP is rumor-based routing: a router tells its neighbors "I can reach network X at distance 3," the neighbor adds one and passes it on, and nobody ever sees the whole picture. That's slow to converge and prone to routing loops. OSPF, standardized by John Moy in RFC 1131 (1989) and refined into RFC 2328 (1998, OSPFv2), takes the opposite approach: give every router the complete map, then let each one compute paths for itself.

The process runs in three phases:

  1. Discover neighbors. Routers multicast Hello packets (to 224.0.0.5) every 10 seconds on broadcast links. When two routers see each other in each other's Hellos, they form an adjacency.
  2. Flood link states. Each router originates a Link-State Advertisement (LSA) describing its own links and their costs. LSAs are flooded reliably to every router in the area, so all of them end up with an identical Link-State Database (LSDB) — a literal graph of the network.
  3. Run SPF. Each router runs Dijkstra's shortest-path-first algorithm on the LSDB, placing itself at the root. The result is a shortest-path tree; the next hop along each branch becomes the route installed in the forwarding table.

The key insight: because the databases are identical and every router computes against the same graph with the same algorithm, all the trees are mutually consistent. No two routers can disagree about the best path, so the network is loop-free by construction — no count-to-infinity, no holddown timers.

The SPF computation, precisely

OSPF models the network as a weighted directed graph G = (V, E) where vertices are routers (and transit networks) and edge weights are link costs. Cost is an integer, never negative — which is exactly the precondition Dijkstra's algorithm needs to be correct.

Each router runs Dijkstra rooted at itself. With a binary-heap priority queue, the running time is:

O((V + E) · log V)

For an OSPF area with, say, 200 routers and 600 links, that's a few thousand heap operations — microseconds of CPU. The expensive part of reconvergence is never the math; it's failure detection and flooding latency. The default cost formula is:

cost = reference_bandwidth / interface_bandwidth

Cisco's reference bandwidth defaults to 100 Mbps, so a 10 Mbps Ethernet link costs 100/10 = 10, a 100 Mbps link costs 1, and a 1 Gbps link also costs 1 — a well-known trap. Cost is the integer floor, with a minimum of 1, so any interface at or above the reference bandwidth collapses to the same cost. The fix is auto-cost reference-bandwidth 100000 (100 Gbps), which makes 1G cost 100, 10G cost 10, and 100G cost 1.

Path cost is the sum of the outgoing interface costs along the path. A path that crosses three links of cost 1, 10, and 1 has total cost 12. When two paths tie, OSPF installs both and load-balances (Equal-Cost Multi-Path, ECMP), up to a platform limit (commonly 16 or 32 next hops).

When to choose OSPF

  • Inside a single organization — a campus, a data center, an enterprise WAN — where you control every router and want fast, automatic shortest-path routing.
  • When you need sub-second reconvergence. Link-state protocols react to topology changes far faster than distance-vector ones.
  • When you run a multi-vendor network. OSPF is an open IETF standard, unlike EIGRP, which was Cisco-proprietary until 2013.
  • When you want hierarchy. The area mechanism scales a flat protocol to thousands of routers by partitioning the flooding domain.

Don't use OSPF to route between autonomous systems on the public Internet — that's BGP's job, because inter-domain routing is about policy and trust, not shortest path. And on very large or very dynamic topologies, IS-IS (the other major link-state IGP) is often preferred by large service providers for its cleaner extensibility.

OSPF vs other routing protocols

OSPFIS-ISRIPEIGRPBGP
FamilyLink-state IGPLink-state IGPDistance-vector IGPAdvanced distance-vector IGPPath-vector EGP
Path algorithmDijkstra (SPF)Dijkstra (SPF)Bellman-FordDUALPolicy + AS-path
MetricCost (bandwidth)Cost (configurable)Hop count (≤15)Composite (BW + delay)Attributes / policy
ConvergenceSub-second (tuned)Sub-second (tuned)Tens of secondsFast (feasible successors)Minutes (by design)
Scaling unitAreas (~50–350 routers)Levels (L1/L2)Flat, tiny networksFlat, summarizationWhole Internet (1M+ prefixes)
StandardRFC 2328 (open)ISO 10589 (open)RFC 2453 (open)RFC 7868 (informational)RFC 4271 (open)
TransportIP protocol 89Directly on layer 2UDP 520IP protocol 88TCP 179
Loop handlingLoop-free by shared mapLoop-free by shared mapSplit horizon, holddownFeasibility conditionAS-path loop detection

The headline split is link-state vs distance-vector. Distance-vector protocols share routes; link-state protocols share the map and compute routes locally. The map approach costs more memory and CPU per router but converges faster and never loops. OSPF and IS-IS are siblings — same Dijkstra core, different packaging.

What the numbers actually say

  • SPF runs in microseconds, not the bottleneck. For a 200-node area, Dijkstra is a few thousand heap operations — well under a millisecond on any modern CPU. Reconvergence time is dominated by failure detection, not computation.
  • Default failure detection: up to 40 seconds. The Dead interval is 4 × Hello (4 × 10s) on broadcast links. That's why production networks add BFD, which detects failure in ~50 ms by exchanging tiny keepalives 20× per second.
  • SPF throttling, not a fixed timer. Modern OSPF uses exponential backoff — Cisco's spf-throttle defaults to a 5 ms initial delay, doubling up to a 5 s ceiling, so a flapping link can't trigger constant full recomputation.
  • Area sizing: roughly 50–350 routers. Beyond that, the LSDB grows large and every change forces every router in the area to re-run SPF; partition into areas to contain the blast radius.
  • Hierarchy is mandatory above one area. All non-backbone areas must attach to area 0, the backbone, which carries inter-area summary routes (Type 3 LSAs).

JavaScript: the SPF core (Dijkstra on the LSDB)

This is exactly what each OSPF router does after flooding settles: build the graph from the link-state database, then run Dijkstra rooted at itself to produce a next-hop table. A real OSPF stack uses a binary heap; this version uses a simple priority extraction for clarity.

// lsdb: { routerId: [ { to, cost }, ... ] }  — identical on every router
function ospfSpf(lsdb, root) {
  const dist = new Map();      // shortest cost to each router
  const nextHop = new Map();   // first hop on the shortest path
  const visited = new Set();
  for (const id of Object.keys(lsdb)) dist.set(id, Infinity);
  dist.set(root, 0);

  while (visited.size < Object.keys(lsdb).length) {
    // extract-min over unvisited (a heap makes this O(log V))
    let u = null, best = Infinity;
    for (const [id, d] of dist) {
      if (!visited.has(id) && d < best) { best = d; u = id; }
    }
    if (u === null) break;     // remaining nodes unreachable
    visited.add(u);

    for (const { to, cost } of lsdb[u] || []) {
      if (cost < 0) throw new Error('OSPF cost must be non-negative');
      const alt = dist.get(u) + cost;
      if (alt < dist.get(to)) {
        dist.set(to, alt);
        // inherit u's first hop, unless u IS the root (then `to` is the first hop)
        nextHop.set(to, u === root ? to : nextHop.get(u));
      }
    }
  }
  return { dist, nextHop };
}

The nextHop bookkeeping is the part textbooks skip: Dijkstra naturally yields total costs, but a router only needs the first hop toward each destination. We inherit the root's neighbor as the next hop and propagate it down the shortest-path tree.

Python: SPF with a real priority queue and ECMP

The same computation with heapq for the O((V + E) log V) bound, plus equal-cost multipath — when two paths tie, OSPF keeps both and load-balances.

import heapq

def ospf_spf(lsdb, root):
    """lsdb: {router: [(neighbor, cost), ...]} identical on every router."""
    dist = {r: float('inf') for r in lsdb}
    next_hops = {r: set() for r in lsdb}   # set → supports ECMP
    dist[root] = 0
    pq = [(0, root)]
    while pq:
        d, u = heapq.heappop(pq)
        if d > dist[u]:
            continue                        # stale heap entry
        for v, cost in lsdb.get(u, ()):
            assert cost >= 0, "OSPF cost must be non-negative"
            alt = d + cost
            if alt < dist[v]:
                dist[v] = alt
                next_hops[v] = {v} if u == root else set(next_hops[u])
                heapq.heappush(pq, (alt, v))
            elif alt == dist[v]:            # equal-cost path → ECMP
                next_hops[v] |= {v} if u == root else next_hops[u]
    return dist, next_hops

# Example: a 4-router square with a diagonal shortcut
lsdb = {
    'A': [('B', 1), ('C', 4)],
    'B': [('A', 1), ('D', 1)],
    'C': [('A', 4), ('D', 1)],
    'D': [('B', 1), ('C', 1)],
}
dist, hops = ospf_spf(lsdb, 'A')
print(dist)   # {'A': 0, 'B': 1, 'C': 3, 'D': 2}  — A reaches C via B-D (cost 3), not direct (cost 4)

Notice OSPF picks A → B → D → C at cost 3 over the direct A → C link at cost 4. That's the whole point: the shortest path isn't the fewest hops, it's the lowest total cost.

Variants and key mechanisms worth knowing

OSPFv3 (RFC 5340). The IPv6 rework. It decouples the protocol from the addresses it carries — topology and reachability are separate LSA types — so it can carry both IPv4 and IPv6 over the same adjacencies. Runs over IPv6 link-local addresses.

Areas and the backbone. Area 0 is the backbone; every other area must connect to it, directly or via a virtual link. Area Border Routers (ABRs) sit on the boundary and summarize one area's routes into another with Type 3 LSAs, hiding intra-area churn.

Stub, totally-stubby, and NSSA areas. Optimizations that suppress external (Type 5) LSAs inside an area and inject a default route instead — shrinking the LSDB on edge routers that don't need full external visibility. NSSA (Not-So-Stubby Area) is the compromise that still allows a local external route via Type 7 LSAs.

DR and BDR on broadcast links. On a shared Ethernet segment with N routers, full mesh adjacency would mean O(N²) LSAs. OSPF elects a Designated Router (and Backup) so everyone adjoins only the DR — flooding becomes O(N). The DR originates a Network (Type 2) LSA representing the segment as a pseudo-node.

IS-IS. Not a variant but the cousin: same Dijkstra core, but runs directly on layer 2 (no IP), uses a simpler two-level hierarchy, and is favored by large carriers for its TLV-based extensibility.

Common bugs and edge cases

  • Mismatched Hello/Dead timers. Two routers won't form an adjacency if their Hello and Dead intervals differ. They'll sit in INIT or EXSTART forever. The timers are exchanged in the Hello packet and must match.
  • MTU mismatch stuck in EXSTART/EXCHANGE. Database Description packets carry the interface MTU; if they disagree, the adjacency hangs in EXSTART. A classic, infuriating misconfiguration.
  • Duplicate Router IDs. Each router needs a unique 32-bit Router ID. Two routers with the same ID corrupt the LSDB and cause routes to flap. IDs are usually derived from the highest loopback IP.
  • The bandwidth-cost trap. With the default 100 Mbps reference bandwidth, a 1 Gbps and a 100 Gbps link both cost 1, so OSPF can't tell them apart. Raise the reference bandwidth network-wide — and keep it consistent, or routers compute different costs for the same link.
  • Area 0 partition. If the backbone splits in two, inter-area routing breaks. A virtual link can stitch a disconnected area back to area 0, but it's a band-aid, not a design.
  • Negative or zero costs. Dijkstra is only correct for non-negative weights, so OSPF cost is clamped to a minimum of 1. Don't try to model a "preferred" link with cost 0 — use a low positive cost instead.

Frequently asked questions

What is the difference between OSPF and BGP?

OSPF is an interior gateway protocol (IGP): it routes inside a single administrative domain by computing shortest paths from a shared link-state map. BGP is an exterior gateway protocol (EGP): it routes between domains using policy and AS-path vectors, not shortest path. You run OSPF inside your network and BGP at its edges to talk to the rest of the Internet.

Why does OSPF use Dijkstra's algorithm instead of Bellman-Ford?

Because every OSPF router holds an identical, complete map of the topology, each one can run Dijkstra locally to compute loop-free shortest paths. Bellman-Ford (used by distance-vector protocols like RIP) only shares distance estimates with neighbors, which converges slowly and suffers from count-to-infinity loops. OSPF link costs are non-negative, which is exactly the condition Dijkstra requires.

How does OSPF calculate link cost?

By default cost = reference_bandwidth / interface_bandwidth, where Cisco's reference bandwidth is 100 Mbps. So a 10 Mbps link costs 10, a 100 Mbps link costs 1, and anything 100 Mbps or faster also costs 1 unless you raise the reference bandwidth. On modern 10G/100G networks you set 'auto-cost reference-bandwidth' to a higher value so fast links get distinct, lower costs.

What are OSPF areas and why do they exist?

An area is a group of routers that share a full link-state database and run SPF among themselves. Areas exist to limit the cost of flooding and recomputation: a topology change inside one area only forces routers in that area to re-run Dijkstra. All areas must connect to a backbone area (area 0), which carries inter-area routes between them.

How fast does OSPF reconverge after a link fails?

Sub-second is achievable. Default Hello/Dead timers detect failure in up to 40 seconds, but BFD plus tuned SPF throttling (initial delay measured in milliseconds) brings end-to-end reconvergence under 200 ms on well-engineered networks. The SPF computation itself is O((V + E) log V) and runs in microseconds for typical area sizes.

Why is the OSPF topology limited to a few hundred routers per area?

Every router stores every other router's LSAs and re-runs SPF on each change, so memory and CPU grow with area size. A flapping link in a 1,000-router single area triggers constant full SPF runs across all of them. Splitting into areas of roughly 50–350 routers contains the blast radius and keeps each database small.