Algorithms
PageRank
The steady state of a random surfer who occasionally teleports
PageRank ranks nodes in a graph by the steady-state probability of a random surfer landing on them. It made Google in 1998 and now powers citation analysis, fraud detection, and recommendation systems. With damping factor 0.85 it converges to four-decimal accuracy in about 50 iterations on graphs with billions of edges.
- Time per iterationO(V + E)
- Iterations to converge~50
- Damping factor (Google)0.85
- Underlying mathsStationary distribution of Markov chain
- First publishedBrin & Page 1998
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How PageRank works
Imagine a person clicking links at random on the web. From any page they pick one outgoing link uniformly and follow it. Once in a while — with probability 1 − d — they get bored and teleport to a random page. After a long time, what fraction of their clicks land on each page? That fraction is the page's PageRank.
The trick is that this steady-state distribution rewards being linked-to from important pages. A link from a single high-PageRank page beats a thousand links from obscure ones. The whole web becomes a system of mutual endorsements, recursively defined: your importance is the sum of the importances of pages that link to you, weighted by how generously they spread their endorsements.
Mathematically, PageRank is the leading eigenvector of the Google matrix:
PR(v) = (1 - d)/n + d · Σ_{u in In(v)} PR(u) / outDegree(u)
where d is the damping factor (default 0.85) and n is the total number of nodes. The recursion is solved by power iteration: start with PR = 1/n for everyone, plug into the right-hand side, get a new PR, repeat. The Perron-Frobenius theorem guarantees convergence to a unique fixed point as long as the graph is irreducible and aperiodic — which the teleport term enforces by construction.
When PageRank is the right tool
- You have a graph where edges represent some kind of endorsement, citation, or trust.
- You want a single global score per node, not a query-specific one.
- The graph is large enough that explicit eigendecomposition (O(V³)) is impossible but power iteration (O(V + E) per step) is feasible.
- The graph isn't pathological — it has at least some cycles and isn't a tree, so random-walk equilibrium is a meaningful concept.
For query-specific ranking on a small subgraph, HITS is more appropriate. For ranking by direct similarity rather than by endorsement chains, try a simpler centrality measure (degree, betweenness) — and remember that DFS reachability already answers many questions PageRank gets misapplied to.
PageRank vs HITS vs other centralities
| PageRank | HITS | Degree centrality | Betweenness | |
|---|---|---|---|---|
| Output per node | 1 score | 2 scores (hub, authority) | 1 score | 1 score |
| Query-dependent? | No | Yes (subgraph) | No | No |
| Time | O((V+E) · iters) | O((V+E) · iters) | O(V + E) | O(V · E) |
| Considers global structure | Yes (random walk) | Yes (within subgraph) | No (local only) | Yes (all paths) |
| Resistance to spam | Medium (link farms) | Low (TKC effect) | None | High |
| Famous deployment | Google search 1998 | IBM CLEVER 1998 | — | Social network analysis |
| Convergence guarantee | Yes (Perron-Frobenius) | Yes (singular vectors) | Closed form | Closed form |
The TKC ("tightly knit community") effect that hurts HITS is exactly what teleport in PageRank prevents: a clique of pages can't bootstrap each other's scores indefinitely because some probability mass always leaks back out to the rest of the graph.
Pseudo-code
function pagerank(graph, d=0.85, tol=1e-8, maxIter=100):
n = number of nodes
rank = { v: 1/n for each v }
for iter in 1..maxIter:
newRank = { v: (1-d)/n for each v }
for each node u with outDegree(u) > 0:
share = d * rank[u] / outDegree(u)
for each w in outNeighbours(u):
newRank[w] += share
# Handle dangling nodes by spreading their mass uniformly.
dangling = sum(rank[u] for u with outDegree(u) == 0)
for each v: newRank[v] += d * dangling / n
if max(|newRank[v] - rank[v]|) < tol: return newRank
rank = newRank
return rank
JavaScript: power iteration on a small graph
function pagerank(graph, d = 0.85, tol = 1e-8, maxIter = 100) {
const nodes = Object.keys(graph);
const n = nodes.length;
const out = Object.fromEntries(nodes.map(v => [v, (graph[v] || []).length]));
let rank = Object.fromEntries(nodes.map(v => [v, 1 / n]));
for (let iter = 0; iter < maxIter; iter++) {
const next = Object.fromEntries(nodes.map(v => [v, (1 - d) / n]));
// Spread each node's rank along its out-edges.
for (const u of nodes) {
if (out[u] === 0) continue;
const share = d * rank[u] / out[u];
for (const w of graph[u]) next[w] += share;
}
// Dangling-node correction: spread their mass uniformly.
const dangling = nodes.reduce((s, v) => out[v] === 0 ? s + rank[v] : s, 0);
for (const v of nodes) next[v] += d * dangling / n;
// Check for convergence.
let delta = 0;
for (const v of nodes) delta = Math.max(delta, Math.abs(next[v] - rank[v]));
rank = next;
if (delta < tol) return { rank, iterations: iter + 1 };
}
return { rank, iterations: maxIter };
}
// Tiny example: A → B, A → C, B → C, C → A (a 3-page mini-web).
const g = { A: ['B', 'C'], B: ['C'], C: ['A'] };
const { rank, iterations } = pagerank(g);
console.log(iterations, rank);
// => ~30 iterations, rank ≈ { A: 0.388, B: 0.214, C: 0.397 }
Python: same, with numpy for matrix form
import numpy as np
def pagerank(graph, d=0.85, tol=1e-8, max_iter=100):
nodes = list(graph)
idx = {v: i for i, v in enumerate(nodes)}
n = len(nodes)
out = np.array([len(graph[v]) for v in nodes], dtype=float)
rank = np.full(n, 1.0 / n)
for it in range(max_iter):
nxt = np.full(n, (1 - d) / n)
for i, u in enumerate(nodes):
if out[i] == 0: continue
share = d * rank[i] / out[i]
for w in graph[u]:
nxt[idx[w]] += share
# Dangling correction.
dangling = rank[out == 0].sum()
nxt += d * dangling / n
if np.max(np.abs(nxt - rank)) < tol:
return dict(zip(nodes, nxt)), it + 1
rank = nxt
return dict(zip(nodes, rank)), max_iter
g = {'A': ['B', 'C'], 'B': ['C'], 'C': ['A']}
ranks, it = pagerank(g)
print(it, ranks)
# => ~30, {'A': 0.388, 'B': 0.214, 'C': 0.397}
Variants and extensions
- Damping-factor tuning. Brin and Page's original 0.85 is the field default; some applications (citation analysis, biology) use values from 0.5 to 0.95. Lower damping converges faster and produces flatter rankings; higher damping rewards link structure more aggressively but converges more slowly.
- Personalized PageRank. Replace the uniform teleport vector with one biased toward a user's interests. Twitter's Who To Follow recommendation system has used personalized PageRank as a core component. Computing it from scratch per user is expensive; in practice it's approximated with bookmark-coloured Monte Carlo or precomputed bases.
- Topic-sensitive PageRank (Haveliwala 2002). Precompute one PageRank vector per coarse topic (sports, technology, politics, …) using teleport vectors restricted to that topic. At query time, blend the precomputed vectors by query-topic affinities — the cheapest way to get topical rankings without per-user recomputation.
- TrustRank. Anti-spam variant that seeds the teleport vector from a small set of hand-vetted "trusted" pages. Spam pages can't accumulate trust because their endorsements come from other untrusted pages.
- SimRank. Different problem: similarity rather than importance. "Two nodes are similar if they're linked to by similar nodes." Same iterative power-method flavour as PageRank.
- Lazy random walk / random walk with restart. The probability of staying in place is positive; equivalent to PageRank in the limit but useful for short random-walk approximations on huge graphs.
- GPU / distributed PageRank. Each iteration is a sparse matrix-vector product, embarrassingly parallel by output node. The MapReduce paper used PageRank as a canonical example: emit (target, contribution) pairs from each source, sum by target.
Cost in real terms
Each iteration is a single sparse matrix-vector multiply: O(V + E). For the 2003 web graph (~120 M nodes, ~1 B edges) one iteration runs in about 5 seconds on a single modern machine; converging to four decimals takes ~50 iterations, so roughly 4 minutes total. At Google's 2008 scale (~1 T pages), 50 iterations across MapReduce shards still completed within a few hours of cluster time. The geometric convergence rate is exactly the damping factor: at d = 0.85, error shrinks by 0.85 per iteration, giving 10-8 tolerance in about 113 iterations — but rank order stabilizes far earlier.
Common bugs and edge cases
- Dangling nodes. A node with no outgoing edges leaks probability mass every iteration. Without correction, the ranks no longer sum to 1 and the spectral guarantees fail. The standard fix is to treat dangling nodes as if they linked uniformly to all nodes — redistribute their mass through the teleport step.
- Disconnected graphs. If the graph has two components with no path between them, the random surfer can never cross — but the teleport term saves the day, since it's added uniformly. Forgetting the teleport (or setting d = 1) breaks the algorithm completely on disconnected graphs.
- Self-loops counted in out-degree. A page that links to itself shouldn't count that as an outgoing link for ranking purposes — otherwise self-loops let spam pages keep all their own rank. Strip self-loops during preprocessing.
- Floating-point underflow on huge graphs. With 109 nodes, 1/n is ~10-9; products with damping factor several iterations deep can underflow without care. Use double precision and avoid storing intermediate (1/n)k products.
- Non-normalized teleport vector. The teleport vector must sum to 1. If you build a personalized one and accidentally double-count or miss nodes, you bias the entire stationary distribution by the same factor.
- Iteration count too low. Stopping at 10 iterations gives ranks correct to maybe 1 decimal — fine for top-N display, wrong for downstream models that consume the score directly. Always check convergence with a tolerance, not a fixed iteration count, when accuracy matters.
- Confusing in-degree with PageRank. A page with one link from a high-PageRank source can outrank a page with a thousand links from obscure sources. PageRank is recursive and weighted, not a counting metric.
Frequently asked questions
What damping factor does Google use?
Brin and Page's original 1998 paper used d = 0.85 and that has stuck as the canonical value across the literature and most production systems. Smaller d makes ranks closer to a uniform distribution and accelerates convergence; larger d emphasizes the link structure but slows convergence and inflates the influence of dense subgraphs. 0.85 is the empirical sweet spot.
Why does PageRank converge in about 50 iterations?
Power iteration converges geometrically with rate equal to the second-largest eigenvalue, which for the Google matrix is exactly the damping factor d. With d = 0.85, the error shrinks by 0.85 per iteration. To go from initial error 1 to tolerance 10^-8, you need ~log(10^-8) / log(0.85) ≈ 113 iterations; in practice 50 is enough for ranking, since rank order stabilizes long before numerical convergence.
What's a dangling node and why does it matter?
A node with no outgoing links — a PDF, a leaf page, an orphan in any graph. Without special handling its share of probability mass disappears every iteration and the ranks no longer sum to 1. The fix is to treat each dangling node as if it linked to every node uniformly: redistribute its mass through a teleport step, just like the random-jump term.
How is PageRank different from HITS?
Both are 1998 link-analysis algorithms. PageRank assigns one score per node from a global random walk on the entire graph and is query-independent. HITS computes two scores per node — hub and authority — on the subgraph induced by a query, so it's query-dependent and slower per query but more topical. Google chose PageRank; HITS lives on in academic citation analysis.
Can PageRank be personalized?
Yes. Replace the uniform teleport vector (1/n for every node) with a sparse vector concentrated on the user's interests — pages they've bookmarked, topics they've searched. The result is personalized PageRank: ranks biased toward the user. Topic-sensitive PageRank pre-computes a small bank of these for canonical topics and blends them at query time.
Is PageRank still used today?
In its pure form, no — modern Google ranking is a learned model with hundreds of features. But link-graph signals derived from PageRank-style random walks remain inputs to that model. Outside web search, PageRank thrives: Twitter's Who To Follow, biology's protein-interaction networks, Wikipedia's importance scores, and many fraud detection systems all run PageRank on their own graphs.