Algorithms

Topological Sort

Linearize a DAG so dependencies come before dependents

Topological sort orders the vertices of a directed acyclic graph (DAG) so every directed edge u → v has u before v in the output. It's how build systems, package managers, course-prerequisite checkers, and scheduler engines compute "what to do first." Two equally good algorithms — DFS-based and Kahn's BFS-based — both run in O(V + E).

  • TimeO(V + E)
  • SpaceO(V)
  • Required propertyGraph must be a DAG (no cycles)
  • OutputA valid linear ordering (multiple may exist)
  • AlgorithmsDFS-based (finish times) or Kahn's (BFS)
  • Used inBuild systems, dependency resolution, scheduling

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How topological sort works (Kahn's algorithm)

Kahn's algorithm is the more intuitive of the two. The idea: a vertex with no incoming edges (in-degree 0) has no prerequisites, so it can come first. Once we pick it, removing it from the graph may free up new vertices whose only remaining prerequisite was this one. Repeat until the graph is empty.

  1. Compute the in-degree of every vertex.
  2. Initialize a queue with all in-degree-0 vertices.
  3. While the queue isn't empty: dequeue a vertex, append it to the output, decrement the in-degree of each of its neighbors. If a neighbor's in-degree drops to 0, enqueue it.
  4. If the output has all V vertices, return it. Otherwise the graph has a cycle and no valid topological sort exists.

DFS-based topological sort

The DFS-based version uses finish times. Run DFS from each vertex; when DFS finishes processing a vertex (all its descendants have been visited), push it onto a stack. After the DFS completes, reverse the stack — that's the topological order.

The intuition: in DFS, a vertex finishes only after all its descendants finish. So in finish-time order, descendants finish before their ancestors. Reverse it, and ancestors come before descendants — which is exactly the topological order.

function topoSortDFS(graph) {
  const visited = new Set();
  const order = [];
  function dfs(node) {
    if (visited.has(node)) return;
    visited.add(node);
    for (const neighbor of graph[node] || []) dfs(neighbor);
    order.push(node); // finish — push when leaving
  }
  for (const node of Object.keys(graph)) dfs(node);
  return order.reverse();
}

Kahn's vs DFS-based

Kahn's algorithmDFS-based
TimeO(V + E)O(V + E)
SpaceO(V) — queue + in-degree arrayO(V) — recursion stack + visited
Cycle detectionNatural — count of processed vs totalNatural — back-edge detection during DFS
Order producedSources first — BFS-ishSinks pushed first, reversed at end
ParallelizableYes — can extract all in-degree-0 vertices simultaneouslySequential by nature
Stack overflow riskNone (iterative)Yes (deep DAGs)
Best forBuild systems wanting parallelismEducational, clean recursion

Real build systems use Kahn's because the "in-degree 0" set tells you what can run in parallel right now. DFS-based is simpler to write but doesn't expose parallelism naturally.

When to use topological sort

  • Build systems and task schedulers. Make, Bazel, Gradle, npm, Maven — all use topological sort to determine compile and link order. Parallel builds use Kahn's to identify tasks with all dependencies satisfied.
  • Course prerequisite scheduling. Each course is a vertex; "course A is a prereq for course B" is an edge from A to B. Topological sort produces a valid order to take them.
  • Spreadsheet recalculation. Cells with formulas form a DAG of dependencies. Topological sort gives the order to recalculate after a change.
  • Symbol resolution in compilers and linkers. Module imports form a DAG. Topological sort gives the linking order so each module's symbols are defined when it's loaded.
  • Pre-processing for DP on DAGs. Many DP problems on DAGs (longest path, count paths, shortest with negative weights) require processing vertices in topological order so subproblems are solved first.

JavaScript — Kahn's algorithm

function topoSortKahn(graph) {
  // graph is an adjacency list: { 'A': ['B', 'C'], 'B': ['D'], ... }
  const inDegree = new Map();
  const allNodes = new Set(Object.keys(graph));

  // Compute in-degrees
  for (const node of Object.keys(graph)) {
    if (!inDegree.has(node)) inDegree.set(node, 0);
    for (const neighbor of graph[node]) {
      allNodes.add(neighbor);
      inDegree.set(neighbor, (inDegree.get(neighbor) || 0) + 1);
    }
  }
  for (const node of allNodes) if (!inDegree.has(node)) inDegree.set(node, 0);

  // Queue all in-degree-0 vertices
  const queue = [...allNodes].filter(n => inDegree.get(n) === 0);
  const result = [];

  while (queue.length) {
    const node = queue.shift();
    result.push(node);
    for (const neighbor of graph[node] || []) {
      inDegree.set(neighbor, inDegree.get(neighbor) - 1);
      if (inDegree.get(neighbor) === 0) queue.push(neighbor);
    }
  }

  if (result.length !== allNodes.size) {
    throw new Error('Cycle detected — no valid topological order');
  }
  return result;
}

For better performance on large graphs, replace queue.shift() with a head pointer or use collections.deque in Python — shift() is O(n) in JavaScript.

Python implementation

from collections import deque, defaultdict

def topo_sort(graph):
    in_degree = defaultdict(int)
    all_nodes = set(graph.keys())
    for node in graph:
        for neighbor in graph[node]:
            in_degree[neighbor] += 1
            all_nodes.add(neighbor)
    for node in all_nodes:
        in_degree.setdefault(node, 0)

    queue = deque(n for n in all_nodes if in_degree[n] == 0)
    result = []
    while queue:
        node = queue.popleft()
        result.append(node)
        for neighbor in graph.get(node, []):
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    if len(result) != len(all_nodes):
        raise ValueError('Cycle detected')
    return result

Famous topological sort problems

Course Schedule (LeetCode 207, 210)

Given n courses and a list of prerequisites, can you complete all courses? (And: in what order?) Build the prerequisite graph and run Kahn's. If all n courses end up in the topological order, you can complete them; the order itself is your answer.

Alien Dictionary

Given a list of words sorted in some unknown alphabetical order, derive the alphabet. Compare adjacent words to find the first differing character; that gives an edge "earlier letter → later letter" in the alphabet's ordering DAG. Topological-sort the resulting graph to get the alphabet — or detect a cycle, meaning the input is inconsistent.

Parallel Build Scheduling

// Returns groups of tasks that can run in parallel, in order
function parallelBatches(graph) {
  const inDegree = new Map();
  const allNodes = new Set(Object.keys(graph));
  for (const node of Object.keys(graph)) {
    inDegree.set(node, inDegree.get(node) || 0);
    for (const neighbor of graph[node]) {
      allNodes.add(neighbor);
      inDegree.set(neighbor, (inDegree.get(neighbor) || 0) + 1);
    }
  }
  for (const node of allNodes) if (!inDegree.has(node)) inDegree.set(node, 0);

  const batches = [];
  let current = [...allNodes].filter(n => inDegree.get(n) === 0);

  while (current.length) {
    batches.push(current);
    const next = [];
    for (const node of current) {
      for (const neighbor of graph[node] || []) {
        inDegree.set(neighbor, inDegree.get(neighbor) - 1);
        if (inDegree.get(neighbor) === 0) next.push(neighbor);
      }
    }
    current = next;
  }
  return batches;
}

Each batch contains tasks that can all run simultaneously. Build systems use this to maximize parallelism — running each batch as a parallel pool of jobs.

Common bugs and edge cases

  • Forgetting to count vertices reached. If the input graph has a cycle, the algorithm doesn't crash — it just produces a partial result. Always check that the output has all V vertices; raise an error if not.
  • Missing vertices that have no edges. Vertices with in-degree 0 AND out-degree 0 might be missing from the adjacency list. Track them explicitly via a set of all known vertices.
  • Mutating in-degrees of the input. If you decrement in-degrees of a graph the caller still uses, you've trashed their data. Either copy the in-degrees or document the mutation.
  • Treating undirected graphs as DAGs. Topological sort is meaningless for undirected graphs. If the input has bidirectional edges, every pair forms a 2-cycle and no topological order exists.
  • Slow queue operations. JavaScript's arr.shift() is O(n). On a graph with 100k vertices, that's 100k × 100k = 10 billion operations vs the algorithm's O(V + E). Use a deque or maintain a head index.

Frequently asked questions

Why does topological sort require a DAG?

A cycle u → v → ... → u means u must come before v AND v before u — impossible in a linear ordering. So a directed graph with any cycle has no topological sort. The algorithms detect cycles as a side effect: if Kahn's queue empties before all vertices are processed, there's a cycle. If DFS encounters a back edge to a gray (currently-on-stack) node, there's a cycle.

How many topological orders does a DAG have?

At least one if the graph is a DAG. Often many — any pair of vertices with no path between them can appear in either order. A DAG with no edges has n! valid orderings. A DAG that's a chain has exactly one. Real DAGs land somewhere in between; the algorithms produce one valid ordering, not a unique one.

What's the difference between Kahn's algorithm and DFS-based topological sort?

Both are O(V + E). Kahn's processes vertices in BFS-ish order, popping one with in-degree 0 each iteration. DFS-based does a recursive DFS and pushes each vertex onto a stack on finish (post-order); reversing the stack gives the topological order. Kahn's is more natural for cycle detection (early termination), DFS-based produces what's often called a "natural" order from the source side.

When does topological sort fail?

Whenever the graph has a cycle. Detecting that is the algorithm's bonus output. In Kahn's, if you process fewer than V vertices before the queue empties, the remaining ones form a cycle. In DFS-based, hitting a back edge during traversal means there's a cycle and no valid topological order exists.

How do build systems use topological sort?

Each task is a vertex; an edge from A to B means "A depends on B" (or "B must run before A"). Topological sort gives a valid execution order. Make, Bazel, Gradle, and CI tools like GitHub Actions all use this. Parallel build systems compute "vertices with no remaining dependencies" — the same property Kahn's algorithm checks — to decide what can run concurrently.

What's the relationship between topological sort and DP on a DAG?

Many DP problems on a DAG (longest path, shortest path with negative weights but no cycles) require processing vertices in topological order. The topological order ensures that when you compute a vertex's value, all its predecessors are already done — that's the optimal-substructure precondition for DP.

Can you topological sort an undirected graph?

No, the concept doesn't apply. Topological sort is defined on directed graphs because the ordering follows the direction of edges. Undirected graphs have no inherent direction. If you want to "sort" an undirected graph, you might be thinking of BFS from a root (level-order) or DFS finish order — different concepts.