Graph Algorithms
Kosaraju's Strongly Connected Components
Two DFS passes, reverse-order on the transpose — every SCC falls out
Kosaraju's algorithm finds strongly connected components with two DFS passes — forward on the original graph, reverse on the transpose. It runs in O(V+E) and is the most intuitive SCC algorithm.
- TimeO(V + E)
- SpaceO(V + E) (transpose)
- DFS passes2 (vs 1 for Tarjan)
- Output orderReverse topological
- Year published1978
- vs Tarjan~2× slower in practice
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How Kosaraju's algorithm works
A strongly connected component (SCC) of a directed graph is a maximal set of vertices where every pair is mutually reachable. Kosaraju's approach is two ordinary DFS traversals — no clever bookkeeping, no low-link values.
- First pass: run DFS on the original graph G. Record vertices in the order they finish (i.e. push each vertex onto a stack as DFS returns from it).
- Compute the transpose: reverse every edge of G. Same vertices, opposite directions. Call the result G^T.
- Second pass: pop vertices off the finish-order stack one at a time. For each unvisited vertex, run DFS in G^T from it. The DFS subtree visits exactly one SCC.
That's it. Each pass is linear in V + E. The two-pass approach is the most intuitive SCC algorithm because each step is a vanilla DFS — no extra invariants to track during traversal.
Why it works
The proof hinges on two observations.
Observation 1. SCCs are identical in G and G^T. If u and v are mutually reachable in G, they are mutually reachable in G^T (just reverse every path). So the partition into SCCs is invariant under transpose.
Observation 2. Consider two SCCs A and B with an edge from A to B in G. Then in any DFS on G, the vertex of A with the latest finish time finishes after every vertex of B. That's because either DFS reaches A first (so B is explored as A's descendant — A finishes after B), or DFS reaches B first (so B finishes before A even starts — A finishes after B).
Combine the two: pop vertices in reverse-finish order. The first popped vertex is in the "source" SCC of the condensation. Start DFS in G^T from it; in G^T the edges between SCCs are reversed, so this SCC has no outgoing edges in G^T. The DFS therefore stays within the SCC, identifying it. Remove. Repeat — the next SCC popped is the next source of the remaining condensation. Each pop yields exactly one SCC.
Worked example
Take a 7-vertex graph with three SCCs: {0, 1, 2}, {3, 4}, {5, 6}, with bridges 2 → 3 and 4 → 5.
First DFS from 0: visits 0 → 1 → 2 → 3 → 4 (back to 3) → 5 → 6 (back to 5). Finish order (latest first): 0, 1, 2, 3, 4, 5, 6.
Transpose: 0 → 2, 2 → 1, 1 → 0, 3 → 2, 4 → 3, 3 → 4, 5 → 4, 6 → 5, 5 → 6.
Second pass, pop in finish order from latest to earliest, DFS on transpose:
- Pop 0. DFS on G^T: visits 0 → 2 → 1 (back). SCC1 = {0, 1, 2}.
- Pop 1, 2 — already visited, skip.
- Pop 3. DFS on G^T: visits 3 → 4 (back). SCC2 = {3, 4}.
- Pop 4 — already visited.
- Pop 5. DFS on G^T: visits 5 → 6 (back). SCC3 = {5, 6}.
Three SCCs in O(V + E) — 7 + 9 = 16 edge touches each pass, 32 total.
Cost in real terms
On a graph with 10 million vertices and 50 million edges (typical web-crawl shard), Kosaraju finishes in roughly 10 seconds in compiled C++. Tarjan finishes in roughly 5 — the second pass on the transpose doubles cache pressure even though both are O(V+E). The condensation has 1–10% of the original nodes; the 2003 web graph famously had one SCC containing ~27% of all crawled pages.
When to use Kosaraju
- Teaching SCCs — its two-pass structure is the cleanest way to prove SCC partitioning correctness.
- Languages without explicit recursion-stack control where Tarjan's low-link bookkeeping gets messy.
- Whenever you also need the transpose for downstream work (e.g. reverse reachability) — Kosaraju builds it for free.
- Codebases where two simple DFS calls compose cleanly with existing graph utilities.
Prefer Tarjan when memory or cache traffic matters, when the graph is too large to materialise the transpose, or when 2× constants are tight.
Kosaraju vs Tarjan vs Gabow
| Kosaraju | Tarjan | Gabow / path-based | |
|---|---|---|---|
| Time complexity | O(V + E) | O(V + E) | O(V + E) |
| DFS passes | 2 | 1 | 1 |
| Needs transpose | Yes | No | No |
| Auxiliary storage | Finish-order stack + transpose adjacency | 2 ints + bool per vertex | 2 vertex stacks |
| Output order | Reverse topological | Reverse topological | Reverse topological |
| Practical speed | ~2× slower than Tarjan | Fastest typical | Comparable to Tarjan |
| Conceptual simplicity | Highest (two DFSes) | Medium (low-link invariant) | Low (two-stack invariant) |
All three give the same SCC partition. The choice is between Kosaraju's clarity, Tarjan's speed, and Gabow's stack-based invariant — pick the one that matches your team's familiarity and your performance budget.
Pseudo-code
algorithm Kosaraju(graph G):
// Pass 1: finish-order DFS on G
visited = empty set
finishOrder = empty stack
for each v in V(G):
if v not in visited:
dfs1(v)
function dfs1(v):
visited.add(v)
for each w in G.adj[v]:
if w not in visited: dfs1(w)
finishOrder.push(v)
// Build transpose
Gt = transpose of G
// Pass 2: pop in reverse-finish order, DFS on Gt
visited.clear()
SCCs = []
while finishOrder not empty:
v = finishOrder.pop()
if v not in visited:
scc = []
dfs2(v, scc)
SCCs.append(scc)
function dfs2(v, scc):
visited.add(v)
scc.append(v)
for each w in Gt.adj[v]:
if w not in visited: dfs2(w, scc)
return SCCs
JavaScript: cycle detection in a dependency graph
function kosarajuSCC(graph) {
const nodes = Object.keys(graph);
const visited = new Set();
const finishOrder = [];
function dfs1(v) {
visited.add(v);
for (const w of (graph[v] || [])) {
if (!visited.has(w)) dfs1(w);
}
finishOrder.push(v);
}
for (const v of nodes) if (!visited.has(v)) dfs1(v);
// Build transpose
const transpose = {};
for (const v of nodes) transpose[v] = [];
for (const v of nodes) for (const w of (graph[v] || [])) transpose[w].push(v);
// Pass 2: DFS on transpose, popping finishOrder
visited.clear();
const sccs = [];
function dfs2(v, scc) {
visited.add(v);
scc.push(v);
for (const w of (transpose[v] || [])) if (!visited.has(w)) dfs2(w, scc);
}
for (let i = finishOrder.length - 1; i >= 0; i--) {
const v = finishOrder[i];
if (!visited.has(v)) {
const scc = [];
dfs2(v, scc);
sccs.push(scc);
}
}
return sccs;
}
// Detect circular dependencies
function findCycles(graph) {
return kosarajuSCC(graph).filter(scc =>
scc.length > 1 || (graph[scc[0]] || []).includes(scc[0])
);
}
const deps = {
parser: ['lexer', 'ast'],
lexer: ['source'],
ast: ['parser'],
source: [],
compiler: ['parser', 'codegen'],
codegen: ['ast'],
};
console.log(findCycles(deps)); // [['parser', 'ast']]
Python: iterative version for large graphs
Recursion blows up at ~10⁴ depth in Python. For 10⁵-vertex chains you need an iterative DFS — the classic "stack of (vertex, iterator)" trick.
def kosaraju_scc(graph):
nodes = list(graph)
visited = set()
finish_order = []
# Pass 1: iterative DFS on original
for start in nodes:
if start in visited: continue
stack = [(start, iter(graph.get(start, ())))]
visited.add(start)
while stack:
v, it = stack[-1]
advanced = False
for w in it:
if w not in visited:
visited.add(w)
stack.append((w, iter(graph.get(w, ()))))
advanced = True
break
if not advanced:
finish_order.append(v)
stack.pop()
# Build transpose
transpose = {v: [] for v in nodes}
for v in nodes:
for w in graph.get(v, ()):
transpose[w].append(v)
# Pass 2: iterative DFS on transpose in reverse finish order
visited.clear()
sccs = []
for v in reversed(finish_order):
if v in visited: continue
scc = []
stack = [v]
visited.add(v)
while stack:
u = stack.pop()
scc.append(u)
for w in transpose.get(u, ()):
if w not in visited:
visited.add(w)
stack.append(w)
sccs.append(scc)
return sccs
deps = {
'parser': ['lexer', 'ast'],
'lexer': ['source'],
'ast': ['parser'],
'source': [],
'compiler': ['parser', 'codegen'],
'codegen': ['ast'],
}
print(kosaraju_scc(deps))
Variants and alternatives
- Tarjan's algorithm. Single DFS with index/low-link bookkeeping. Same O(V+E), ~2× faster in practice, no transpose. The default in most contest libraries.
- Gabow's algorithm. Single DFS with two vertex stacks, equivalent to Tarjan via a different invariant. Comparable speed; sometimes preferred for its lack of integer fields.
- Implicit transpose. When the transpose is too large to materialise, run pass 2 with an "edge iterator" that consults the original adjacency list and filters by direction. Cuts memory in half at the cost of a constant factor in time.
- Forward-Backward. A parallel SCC algorithm built on Kosaraju's two-direction idea — picks a pivot, computes forward and backward reachability simultaneously, intersects to get one SCC, recurses. Used in distributed graph processing.
- Condensation in one sweep. Once SCCs are found, contract on a single pass over the edges to build the DAG of super-vertices. SCC IDs assigned in reverse finish order are already a valid topological numbering.
Common bugs and edge cases
- Forgetting to reset visited between passes. The two DFS passes are independent. Using the first pass's visited set in pass 2 silently skips most vertices and produces empty SCCs.
- Iterating finish-order in wrong direction. The second pass pops from the top of the stack — equivalent to reverse-finish order. Iterating in finish order (oldest first) produces wrong SCCs that mix unrelated components.
- Wrong adjacency in the transpose. An edge u → v in G becomes v → u in G^T. Mixing up source and destination during transpose construction breaks pass 2 silently — every SCC becomes a singleton.
- Recursion depth. For 10⁵-vertex chains, plain recursion crashes JavaScript (~10k stack frames) and Python (1k default). Use the iterative variant or raise
sys.setrecursionlimit(a temporary fix at best). - Self-loops misclassified as non-cycles. A vertex v with edge v → v forms a singleton SCC that IS a cycle. The naive filter "SCC size > 1" misses these — explicitly check for the self-loop case.
- Disconnected graphs. Both passes must iterate over every vertex, not just reachable-from-zero. Test on disconnected graphs early; this bug doesn't show up on small fully-connected examples.
Frequently asked questions
Why does Kosaraju need two DFS passes?
The first pass orders vertices by DFS finish time on the original graph. The second pass uses that order — reversed — to start fresh DFS traversals on the transpose graph. Each traversal visits exactly one SCC because, by construction, the reverse-finish-order root cannot reach any vertex outside its SCC via transposed edges. Two passes give one SCC per traversal.
How does Kosaraju compare to Tarjan in practice?
Both run in O(V+E). Kosaraju is conceptually simpler — two ordinary DFS calls plus a transpose — while Tarjan packs everything into one DFS with low-link bookkeeping. Tarjan is typically 2× faster because Kosaraju walks every edge twice (once forward, once on the transpose). Tarjan wins on cache; Kosaraju wins on readability and ease of debugging.
Why is the second pass on the reverse graph?
A vertex v in an SCC C can reach every other vertex in C in the original graph and in the transpose. In the original graph, the last-finishing vertex of C is the root of C's DFS subtree. In the transpose graph, starting from that root finds exactly the vertices of C — because the transpose blocks edges that would have led out of C in the original graph. Reverse direction = SCC boundary becomes a wall.
What is a transpose graph?
The transpose G^T of a directed graph G has the same vertices and the same edges but with every edge reversed. If G has edge u → v, then G^T has v → u. SCCs of G and G^T are identical sets of vertices — the SCC partition is invariant under transpose. That's what makes Kosaraju work.
Does Kosaraju handle disconnected graphs?
Yes — both passes iterate over all vertices and start a fresh DFS for any unvisited one. Disconnected components become separate SCC pools naturally. Self-loops form size-1 SCCs that are cycles; isolated vertices are size-1 SCCs that aren't cycles.
What is the condensation graph?
Contract each SCC to a super-vertex, keep edges between SCCs. The result is a DAG by construction — any cycle would lie inside one SCC. Topological sort of the condensation gives a build order for circular dependencies treated as units. Kosaraju produces SCCs already in reverse topological order, so the condensation falls out for free.