Algorithms

Tarjan's Strongly Connected Components

Find every cycle of mutual reachability in one DFS pass

Tarjan's algorithm finds all strongly connected components of a directed graph in a single DFS pass — O(V + E). It detects cycles in dependency graphs, condenses scrambled call graphs into a DAG, and underpins linear-time 2-SAT solvers. Robert Tarjan's 1972 paper has aged remarkably well.

  • TimeO(V + E)
  • SpaceO(V)
  • Passes over graph1 (vs. 2 for Kosaraju)
  • Output orderReverse topological
  • Year published1972

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How Tarjan's algorithm works

A strongly connected component (SCC) of a directed graph is a maximal set of vertices where every pair is mutually reachable. Imagine a module-import graph: any cluster of files that import each other directly or transitively forms an SCC. Files outside the cluster don't.

The naive way to find all SCCs is to run a DFS from every vertex and intersect reachable sets — O(V·(V+E)). Tarjan's insight is that one DFS, augmented with two integer fields per vertex, is enough.

The DFS visits vertices in some order and assigns each vertex an index (the time it was first seen). It also maintains a low-link: the smallest index reachable from the vertex's DFS subtree by tree edges plus at most one back-edge to a vertex still on the algorithm's stack. Crucially, low-link(v) equals index(v) iff v is the root of an SCC. When the DFS finishes at such a root, the algorithm pops everything above and including v from the stack — that's the SCC.

  1. Initialize a counter idx = 0, an empty stack S, and per-vertex index / lowlink / onStack fields.
  2. For each unvisited vertex v, call DFS(v).
  3. Inside DFS(v): assign index[v] = lowlink[v] = idx++, push v onto S, mark on-stack.
  4. For each edge vw: if w unvisited, recurse and set lowlink[v] = min(lowlink[v], lowlink[w]). Else if w is on stack, set lowlink[v] = min(lowlink[v], index[w]).
  5. After the loop, if lowlink[v] == index[v], pop the stack until v comes off; the popped vertices are one SCC.

Why one pass beats two

Kosaraju's algorithm — the textbook simpler-to-explain SCC algorithm — needs two passes of DFS over the graph, one on the original and one on the transpose. That requires either storing the transpose (extra O(V+E) memory) or building it on the fly (twice the cache traffic). Tarjan does the same job in a single pass over the original adjacency list. On large graphs that fit in RAM but not in cache, the single-pass advantage is roughly 2× wall-clock speed.

Tarjan vs Kosaraju vs Gabow

Tarjan (1972)Kosaraju (1978)Gabow / path-based (2000)
Time complexityO(V + E)O(V + E)O(V + E)
DFS passes121
Needs transpose graphNoYesNo
Auxiliary storage2 ints + bool / vertexfinish-time list + transpose2 stacks of vertex IDs
Output orderReverse topologicalReverse topologicalReverse topological
Conceptual simplicityMedium (low-link)High (two DFSes)Low (two stacks invariant)
Practical speedFastest (typical)~2× slowerComparable to Tarjan

Gabow's path-based SCC algorithm is mathematically equivalent to Tarjan but uses two stacks instead of an integer low-link field — a different trade-off in cache behaviour and auxiliary memory.

Pseudo-code

algorithm Tarjan(graph G):
    idx = 0
    S = empty stack
    for each v in V(G):
        if v.index is undefined:
            strongConnect(v)

    function strongConnect(v):
        v.index = v.lowlink = idx
        idx = idx + 1
        S.push(v); v.onStack = true

        for each edge (v, w):
            if w.index is undefined:
                strongConnect(w)
                v.lowlink = min(v.lowlink, w.lowlink)
            else if w.onStack:
                v.lowlink = min(v.lowlink, w.index)

        if v.lowlink == v.index:
            scc = []
            repeat:
                w = S.pop(); w.onStack = false; scc.push(w)
            until w == v
            emit scc

JavaScript: cycle detection in a dependency graph

// graph: adjacency list, e.g. { A: ['B'], B: ['C'], C: ['A'], D: ['B'] }
// Returns array of SCCs in reverse topological order.
function tarjanSCC(graph) {
  const index = new Map(), lowlink = new Map(), onStack = new Set();
  const stack = [], sccs = [];
  let idx = 0;

  function strongConnect(v) {
    index.set(v, idx);
    lowlink.set(v, idx);
    idx++;
    stack.push(v);
    onStack.add(v);

    for (const w of (graph[v] || [])) {
      if (!index.has(w)) {
        strongConnect(w);
        lowlink.set(v, Math.min(lowlink.get(v), lowlink.get(w)));
      } else if (onStack.has(w)) {
        lowlink.set(v, Math.min(lowlink.get(v), index.get(w)));
      }
    }

    if (lowlink.get(v) === index.get(v)) {
      const scc = [];
      let w;
      do {
        w = stack.pop();
        onStack.delete(w);
        scc.push(w);
      } while (w !== v);
      sccs.push(scc);
    }
  }

  for (const v of Object.keys(graph)) {
    if (!index.has(v)) strongConnect(v);
  }
  return sccs;
}

// Detect circular dependencies: any SCC of size > 1 (or size 1 with self-loop) is a cycle.
function findCycles(graph) {
  return tarjanSCC(graph).filter(scc =>
    scc.length > 1 || (graph[scc[0]] || []).includes(scc[0])
  );
}

// Example usage
const deps = {
  parser:   ['lexer', 'ast'],
  lexer:    ['source'],
  ast:      ['parser'],          // ← cycle parser ↔ ast
  source:   [],
  compiler: ['parser', 'codegen'],
  codegen:  ['ast'],
};
console.log(findCycles(deps));   // [['ast', 'parser']]

Python: same algorithm, iterative to avoid recursion limit

def tarjan_scc(graph):
    index = {}; lowlink = {}; on_stack = set()
    stack = []; sccs = []
    idx = 0

    # Iterative DFS with explicit work stack: (vertex, neighbour-iterator).
    for start in graph:
        if start in index: continue
        work = [(start, iter(graph.get(start, ())))]
        index[start] = lowlink[start] = idx; idx += 1
        stack.append(start); on_stack.add(start)

        while work:
            v, it = work[-1]
            advanced = False
            for w in it:
                if w not in index:
                    index[w] = lowlink[w] = idx; idx += 1
                    stack.append(w); on_stack.add(w)
                    work.append((w, iter(graph.get(w, ()))))
                    advanced = True
                    break
                elif w in on_stack:
                    lowlink[v] = min(lowlink[v], index[w])
            if advanced: continue

            # Done with v's neighbours.
            work.pop()
            if work:
                parent, _ = work[-1]
                lowlink[parent] = min(lowlink[parent], lowlink[v])

            if lowlink[v] == index[v]:
                scc = []
                while True:
                    w = stack.pop()
                    on_stack.discard(w)
                    scc.append(w)
                    if w == v: break
                sccs.append(scc)

    return sccs

def find_cycles(graph):
    return [scc for scc in tarjan_scc(graph)
            if len(scc) > 1 or scc[0] in graph.get(scc[0], ())]

deps = {
    'parser':   ['lexer', 'ast'],
    'lexer':    ['source'],
    'ast':      ['parser'],
    'source':   [],
    'compiler': ['parser', 'codegen'],
    'codegen':  ['ast'],
}
print(find_cycles(deps))   # [['ast', 'parser']]

Variants and alternatives

  • Kosaraju-Sharir. Two-pass DFS — first records finish times on the original graph, second runs DFS on the transpose in decreasing finish-time order. Simpler to teach but ~2× slower in cache-bound implementations.
  • Gabow's path-based SCC. Replaces low-link with two stacks: one of all vertices, one of "open" SCC representatives. Same O(V+E) but with a different invariant — sometimes preferred in functional / persistent implementations.
  • Nuutila's improvement. A small optimization to Tarjan that avoids pushing vertices onto the stack if they're trivially their own SCC (no incoming or outgoing edge in the unfinished subgraph). Saves up to 25% on sparse graphs.
  • Pearce's algorithm (2005). Uses a single integer per vertex (no on-stack flag, no separate low-link storage) — the lowest auxiliary-memory linear-time SCC algorithm to date. Standard in some compiler frontends.
  • Iterative + colour-based. When recursion is forbidden (kernel code, embedded), an iterative variant with white/grey/black colouring eliminates stack growth concerns at the cost of slightly more bookkeeping.

Cost in real terms

On a graph with 10 million vertices and 50 million edges (a typical web crawl shard), Tarjan finishes in roughly 5 seconds in compiled C++. Kosaraju takes about 10 seconds for the same input — same asymptotic, double the constant due to the second DFS pass and transpose construction. The condensation graph computed afterwards has typically 1–10% as many nodes as the original; on the 2003 Web graph (~120 M nodes), the largest SCC contained about 27% of the web — the famous "central core" structure.

Common bugs and edge cases

  • Stack pop ordering. The popping loop must pop until — and including — the SCC root v. Stopping one step early leaves v on the stack and breaks the next SCC; stopping one step late steals a vertex from a later SCC. The loop's do { ... } while (w !== v) shape is non-negotiable.
  • Forgetting onStack. When updating low-link via a back edge, you must check that the target is still on the algorithm's stack — not merely visited. A cross-edge to a finished SCC is irrelevant; including it produces wrong low-links and merges unrelated components.
  • Updating low-link with the wrong field. For tree edges use min(low[v], low[w]); for back edges use min(low[v], index[w]). Swapping these (using index[w] for tree edges) silently misses some SCCs.
  • Recursion depth on long chains. A chain of 1 million linearly-dependent modules blows out JavaScript's stack at ~10k frames and Python's at 1k. Use the iterative variant or sys.setrecursionlimit — the latter only postpones the problem.
  • Mutating the graph during DFS. If graph[v] is a generator that the caller can mutate, partial iteration plus mutation reorders neighbours mid-recursion and corrupts low-links. Snapshot the adjacency list at the start.
  • Self-loops. A vertex with an edge to itself forms an SCC of size 1 that is a cycle — a special case worth flagging in cycle-detection wrappers, since size > 1 is the usual filter.

Frequently asked questions

What's the difference between Tarjan and Kosaraju?

Both find SCCs in O(V + E). Kosaraju runs two passes of DFS — once on the original graph, once on the transpose — and uses post-order finish times. Tarjan does it in a single DFS pass with index/low-link bookkeeping and a stack of unfinished vertices. Tarjan needs no transpose and is typically about 2× faster in practice; Kosaraju is conceptually simpler to teach.

Why does Tarjan track low-link values?

The low-link of vertex v is the smallest DFS index reachable from v's subtree by tree edges plus at most one back-edge to a vertex still on the stack. When low-link(v) equals index(v), v is the root of an SCC — every descendant on the stack belongs to its component. This single invariant collapses the two-pass Kosaraju approach into one.

Can Tarjan be made iterative?

Yes, and it's mandatory for graphs deeper than your language's recursion limit. The trick is to keep a manual stack of (vertex, iterator-into-neighbours) frames. JavaScript engines crash around 10,000 deep; Python defaults to 1,000. CPython's call-graph SCC analysis would die without iteration.

What is the condensation graph?

Contract every SCC to a single super-node and keep the edges between SCCs. The result is a DAG — a directed acyclic graph — by construction, because any cycle would have lived inside one SCC. Topological sort of the condensation gives a valid build order for circular dependencies treated as units.

How does Tarjan power SAT solvers?

2-SAT reduces to SCC: build the implication graph (each clause x ∨ y becomes ¬x → y and ¬y → x), find SCCs, and the formula is satisfiable iff no variable and its negation share an SCC. Tarjan does it in linear time, then a topological order over the condensation gives a witness assignment.

Is the SCC of a single vertex with no self-loop a valid SCC?

Yes. By definition an SCC is a maximal set of vertices where every pair is mutually reachable. A vertex is trivially reachable from itself in zero steps, so any singleton is its own SCC. Tarjan emits these as size-1 components — no special-case code needed.