Algorithms
2-SAT
The satisfiability problem that turns into a graph — and gets solved in one linear pass
2-SAT decides a boolean formula made of two-literal clauses in O(n + m) time by building an implication graph, finding its strongly connected components, and checking that no variable shares a component with its own negation.
- DecisionO(n + m)
- Assignment recoveryO(n + m)
- Implication graph size2n nodes, 2m edges
- Clause width≤ 2 literals
- 3-SAT comparisonNP-complete
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
From clauses to forced consequences
Boolean satisfiability — SAT — asks whether you can assign true/false to variables so that a formula in conjunctive normal form comes out true. In general it is the canonical NP-complete problem: Stephen Cook proved it hard in 1971, and the only known algorithms search exponentially many assignments in the worst case. But restrict every clause to at most two literals and the problem collapses into something solvable in linear time. That restricted version is 2-SAT.
The trick is reading each clause as an implication, not a disjunction. A two-literal clause (a ∨ b) says "at least one of a, b is true." Equivalently: if a is false then b must be true, and if b is false then a must be true. Two implications fall out of every clause:
(a ∨ b) ≡ (¬a ⟹ b) ∧ (¬b ⟹ a)
This is the whole idea. Because each clause forces a specific consequence rather than leaving a choice, the formula becomes a network of "if this, then that" arrows. The mathematician Melven Krom first studied this structure in 1967; Bengt Aspvall, Michael Plass, and Robert Tarjan turned it into the linear-time algorithm everyone uses today in their 1979 paper.
Building the implication graph
Create a directed graph with two nodes per variable — one for x and one for ¬x. With n variables that's 2n nodes. For each clause (u ∨ v) add the two implication edges ¬u → v and ¬v → u. With m clauses that's 2m edges.
A unit clause (x) — a single literal — is handled as (x ∨ x), which produces the self-forcing edge ¬x → x: the only way out of ¬x is into x, pinning x to true.
Now the key theorem. The formula is unsatisfiable if and only if some variable x has both x and ¬x in the same strongly connected component (SCC) of the implication graph. A strongly connected component is a maximal set of nodes that can all reach each other. If x and ¬x reach each other, then choosing x = true forces ¬x (a contradiction) and choosing x = false forces x (also a contradiction). No assignment survives.
So the algorithm is: build the graph, run a linear-time SCC algorithm, and scan each variable.
for each variable x:
if comp[x] == comp[¬x]:
return UNSATISFIABLE
Recovering a satisfying assignment
Detecting satisfiability is only half the job; often you want the actual assignment. Condense the implication graph by collapsing every SCC to a single super-node. The result is a directed acyclic graph (DAG), and any DAG has a topological order.
The rule: for each variable, set the literal whose component appears later in topological order to true. Intuitively, a literal that sits "downstream" of all its implications can be set true without forcing anything upstream into a contradiction. There is a clean symmetry guaranteeing this works — the implication graph has a "skew" property: if there's a path a → b, then there's also a path ¬b → ¬a (the contrapositive). That symmetry ensures the topological-order rule never picks both x and ¬x as true.
A delightful implementation detail: Tarjan's SCC algorithm numbers components in reverse topological order as a byproduct of its single DFS. So "pick the later component" becomes the one-liner x is true ⟺ comp[x] < comp[¬x]. No separate topological sort needed.
When 2-SAT is the right tool
- Every constraint is binary either/or. Each entity has exactly two states, and constraints relate pairs of states. Two-shift scheduling, two-sided seating, on/off device placement.
- You need a yes/no feasibility answer fast. Linear time means you can re-check millions of clauses inside a tight loop or a larger optimizer.
- You also want a witness. Unlike a pure decision oracle, the SCC method hands you a concrete assignment at no extra asymptotic cost.
- The constraints may grow at runtime. Implication edges are cheap to add; many incremental or parametric-search algorithms wrap a 2-SAT feasibility check in a binary search over a threshold.
If any clause has three or more literals, 2-SAT does not apply — you are in NP-complete territory and need a general DPLL or CDCL SAT solver. If your problem is "satisfy as many clauses as possible" rather than "satisfy all of them," that's MAX-2-SAT, which is also NP-hard despite the two-literal restriction. The polynomial magic is specifically about the all-or-nothing decision.
2-SAT vs neighboring problems
| 2-SAT | 3-SAT | MAX-2-SAT | Horn-SAT | 2-coloring | |
|---|---|---|---|---|---|
| Clause width | ≤ 2 literals | ≤ 3 literals | ≤ 2 literals | ≤ 1 positive literal | edge = clause |
| Goal | satisfy all | satisfy all | satisfy most | satisfy all | satisfy all |
| Complexity | O(n + m) | NP-complete | NP-hard | O(n + m) | O(n + m) |
| Core technique | implication graph + SCC | backtracking search | approximation / ILP | unit propagation | BFS / DFS bipartite check |
| Returns witness? | yes, free | yes if solver succeeds | best-effort | yes, free | yes, free |
| Worst-case time | linear, guaranteed | exponential | exponential to optimal | linear, guaranteed | linear, guaranteed |
The headline: 2-SAT and Horn-SAT are the two famous polynomial fragments of SAT, and both win by avoiding search entirely — 2-SAT through SCCs, Horn-SAT through unit propagation. Bump 2-SAT up by one literal (3-SAT) or change "all" to "most" (MAX-2-SAT) and you fall off the polynomial cliff. 2-coloring is essentially the special case where every clause encodes "these two must differ."
What the numbers actually say
- 2n nodes and 2m edges. A formula with 1 million variables and 5 million clauses builds a graph of 2 million nodes and 10 million edges — and the entire decision runs in a single DFS over that graph, typically tens of milliseconds in optimized C++.
- One DFS, not two. Tarjan's algorithm computes SCCs in a single depth-first traversal; Kosaraju's needs two passes plus the transpose graph. Both are O(n + m), but Tarjan halves the constant factor and skips materializing the reversed graph.
- Zero extra work for the witness. Assignment recovery is one linear scan comparing
comp[x]againstcomp[¬x]— no separate solve, no backtracking. - Parametric search adds only a log factor. Wrapping 2-SAT in a binary search over a continuous threshold (common in computational geometry, e.g. maximizing label size) costs O((n + m) log R) for R distinct candidate thresholds.
- The phase transition sits near m/n ≈ 1. Random 2-SAT formulas flip from almost-always-satisfiable to almost-always-unsatisfiable as the clause-to-variable ratio crosses 1 — a sharp threshold proven by Chvátal and Reed in 1992.
JavaScript implementation
This implements the full pipeline with iterative Tarjan SCC (recursion would overflow on large inputs). Variables are 0 … n-1; literal x is encoded as node 2x and ¬x as node 2x+1.
class TwoSat {
constructor(n) {
this.n = n;
this.adj = Array.from({ length: 2 * n }, () => []);
}
node(x, val) { return 2 * x + (val ? 0 : 1); } // x→2x, ¬x→2x+1
// Add clause (x is xVal) OR (y is yVal)
addClause(x, xVal, y, yVal) {
const a = this.node(x, xVal), b = this.node(y, yVal);
this.adj[a ^ 1].push(b); // ¬a ⟹ b
this.adj[b ^ 1].push(a); // ¬b ⟹ a
}
solve() {
const N = 2 * this.n;
const comp = new Int32Array(N).fill(-1);
const low = new Int32Array(N), num = new Int32Array(N).fill(-1);
const onStk = new Uint8Array(N), stk = [];
let idx = 0, sccCount = 0;
// Iterative Tarjan
for (let s = 0; s < N; s++) {
if (num[s] !== -1) continue;
const call = [[s, 0]];
while (call.length) {
const frame = call[call.length - 1];
const [u, pi] = frame;
if (pi === 0) { num[u] = low[u] = idx++; stk.push(u); onStk[u] = 1; }
if (pi < this.adj[u].length) {
frame[1]++;
const v = this.adj[u][pi];
if (num[v] === -1) call.push([v, 0]);
else if (onStk[v]) low[u] = Math.min(low[u], num[v]);
} else {
if (low[u] === num[u]) {
while (true) {
const w = stk.pop(); onStk[w] = 0; comp[w] = sccCount;
if (w === u) break;
}
sccCount++;
}
call.pop();
if (call.length) {
const p = call[call.length - 1][0];
low[p] = Math.min(low[p], low[u]);
}
}
}
}
// Tarjan numbers SCCs in reverse topological order.
const assign = new Array(this.n);
for (let x = 0; x < this.n; x++) {
if (comp[2 * x] === comp[2 * x + 1]) return null; // UNSAT
assign[x] = comp[2 * x] < comp[2 * x + 1];
}
return assign;
}
}
// (x0 ∨ x1) ∧ (¬x0 ∨ x2) ∧ (¬x1 ∨ ¬x2) ∧ (x0 ∨ ¬x2)
const ts = new TwoSat(3);
ts.addClause(0, true, 1, true);
ts.addClause(0, false, 2, true);
ts.addClause(1, false, 2, false);
ts.addClause(0, true, 2, false);
console.log(ts.solve()); // [ true, false, true ] — a satisfying assignment
Two details worth flagging. First, a ^ 1 flips the lowest bit, turning a literal node into its negation in constant time — that's why the even/odd encoding is worth it. Second, the comparison comp[2x] < comp[2x+1] only works because Tarjan emits components in reverse topological order; with Kosaraju you'd reverse the test.
Python implementation
Same algorithm, but using Kosaraju's two-pass SCC for variety — it's arguably easier to read and makes the "reverse topological order" explicit.
import sys
from collections import defaultdict
class TwoSat:
def __init__(self, n):
self.n = n
self.adj = defaultdict(list) # implication graph
self.radj = defaultdict(list) # transpose
def _lit(self, x, val): # x -> 2x, ¬x -> 2x+1
return 2 * x + (0 if val else 1)
def add_clause(self, x, xv, y, yv):
a, b = self._lit(x, xv), self._lit(y, yv)
self.adj[a ^ 1].append(b); self.radj[b].append(a ^ 1) # ¬a ⟹ b
self.adj[b ^ 1].append(a); self.radj[a].append(b ^ 1) # ¬b ⟹ a
def solve(self):
N = 2 * self.n
visited = [False] * N
order = []
# Pass 1: post-order on G (iterative to avoid recursion limit)
for s in range(N):
if visited[s]:
continue
stack = [(s, iter(self.adj[s]))]
visited[s] = True
while stack:
node, it = stack[-1]
for w in it:
if not visited[w]:
visited[w] = True
stack.append((w, iter(self.adj[w])))
break
else:
order.append(node)
stack.pop()
# Pass 2: assign components on Gᵀ in reverse finish order
comp = [-1] * N
c = 0
for s in reversed(order):
if comp[s] != -1:
continue
stack = [s]
comp[s] = c
while stack:
node = stack.pop()
for w in self.radj[node]:
if comp[w] == -1:
comp[w] = c
stack.append(w)
c += 1
# comp numbers are in (forward) topological order here.
assign = [False] * self.n
for x in range(self.n):
if comp[2 * x] == comp[2 * x + 1]:
return None # UNSAT
# later-in-topo-order literal is true; Kosaraju gives forward order,
# so the *larger* component number is the later one.
assign[x] = comp[2 * x] > comp[2 * x + 1]
return assign
ts = TwoSat(3)
ts.add_clause(0, True, 1, True)
ts.add_clause(0, False, 2, True)
ts.add_clause(1, False, 2, False)
ts.add_clause(0, True, 2, False)
print(ts.solve()) # a satisfying assignment, or None if UNSAT
Note the flipped comparison versus the JavaScript version: Kosaraju's second pass numbers components in forward topological order, so the later literal has the larger component index. Get this direction wrong and you'll still detect SAT/UNSAT correctly but hand back an assignment that violates a clause.
Variants and relatives worth knowing
Implicit / on-the-fly 2-SAT. When clauses arise from geometric constraints (e.g. "labels A and B can't both face right"), you generate implication edges lazily instead of materializing all 2m of them. The label-placement problem of Formann and Wagner (1991) is the classic example.
Parametric 2-SAT (binary search wrapper). Maximize a parameter subject to feasibility by binary-searching the parameter and running a 2-SAT check at each step. Used to maximize map-label font size or minimize a conflict radius — feasibility is monotone in the parameter, so binary search applies.
Random-walk 2-SAT (Papadimitriou's algorithm). Start with a random assignment; while a clause is unsatisfied, flip a random one of its two variables. This finishes in expected O(n²) flips — slower than the SCC method but the conceptual ancestor of WalkSAT for general SAT.
Horn-SAT. Another polynomial SAT fragment: clauses with at most one positive literal. Solved by unit propagation rather than SCCs. 2-SAT (the bijunctive case) and Horn-SAT are two of the six tractable classes in Schaefer's dichotomy theorem — the others being dual-Horn, affine (XOR-SAT), and the 0-valid and 1-valid families.
QBF and beyond. Plain (existential) 2-SAT is NL-complete. Remarkably, adding quantifier alternation does not break the structure as long as clauses stay width-2: Aspvall, Plass, and Tarjan's 1979 extension evaluates a fully quantified 2-CNF formula in the same linear time via the implication graph. Widen the clauses instead — general QBF over arbitrary CNF — and you jump to PSPACE-complete. The fragility is about clause width, not quantifiers.
Common bugs and edge cases
- Adding the implication in the wrong direction. The clause
(a ∨ b)gives¬a → b, nota → b. The edge starts at the negation of one literal. Inverting it silently produces wrong answers on satisfiable inputs. - Forgetting to add both implications. Each clause yields two edges (
¬a → band¬b → a). Adding only one breaks the contrapositive symmetry the assignment proof relies on. - Comparing component numbers the wrong way. Tarjan emits reverse topological order (use
comp[x] < comp[¬x]); Kosaraju emits forward order (use>). Mixing them up yields an assignment that fails a clause even though the formula is satisfiable. - Recursive DFS stack overflow. A naïve recursive Tarjan blows the call stack at a few tens of thousands of variables. Use an explicit stack for any input that might be large.
- Mishandling unit clauses. A single-literal clause
(x)must add the self-edge¬x → x. Skipping it lets the solver leavexunconstrained. - Treating MAX-2-SAT like 2-SAT. "Satisfy as many clauses as possible" is NP-hard even with two-literal clauses. The polynomial result is strictly about satisfying all of them.
Frequently asked questions
Why is 2-SAT solvable in polynomial time when general SAT is NP-complete?
A two-literal clause (a ∨ b) is logically equivalent to a pair of implications: ¬a ⟹ b and ¬b ⟹ a. That gives 2-SAT a graph structure with only forced consequences, which strongly connected components resolve in linear time. A three-literal clause (a ∨ b ∨ c) has no such two-way implication — knowing one literal is false leaves a disjunction of two, not a single forced value — so 3-SAT loses the structure and becomes NP-complete.
How does the implication graph decide satisfiability?
Build a node for every literal, both x and ¬x, then add the two implication edges per clause. Compute the strongly connected components. The formula is satisfiable if and only if no variable x has x and ¬x in the same component. If they share a component, x implies ¬x and ¬x implies x, which is a contradiction — no assignment can satisfy the formula.
How do you recover an actual satisfying assignment?
Condense the graph into its SCC DAG and process components in reverse topological order — which both Tarjan's and Kosaraju's algorithms hand you for free. For each variable, set the literal whose component comes later in topological order to true. Because Tarjan numbers SCCs in reverse topological order, the rule simplifies to: x is true when comp[x] < comp[¬x].
What is the time and space complexity of 2-SAT?
O(n + m) time and O(n + m) space, where n is the number of variables and m is the number of clauses. The implication graph has 2n nodes and 2m edges; the SCC pass (Tarjan or Kosaraju) is linear in nodes plus edges; assignment recovery is a single linear scan. This is asymptotically optimal — you must at least read the input.
What real problems reduce to 2-SAT?
Anything expressible as binary either/or constraints: scheduling two-shift assignments, 2-coloring with forbidden pairs, the wedding-seating problem where guests must sit on one of two sides, conflict-free placement of axis-aligned labels on a map, and feasibility checks inside larger solvers. Many constraint problems that look harder collapse to 2-SAT once each variable has only two states.
Is 2-SAT the same as solving with a general DPLL SAT solver?
No. A DPLL solver would still work on 2-SAT, but it relies on backtracking search that is exponential in the worst case for general formulas. The SCC method exploits the special two-literal structure to guarantee linear time with no search at all. Use the implication-graph algorithm whenever every clause has at most two literals.