Algorithms

Big O Notation

How algorithm cost scales with input size

Big O notation describes how an algorithm's running time or memory usage grows as the input gets larger. Saying "binary search is O(log n)" means doubling the input adds one extra comparison, not double the work — that single insight separates algorithms that scale to billions from algorithms that don't.

  • Most common ratesO(1), O(log n), O(n), O(n log n), O(n²), O(2ⁿ)
  • Focuses onWorst case (typically)
  • IgnoresConstants and lower-order terms
  • Used forTime and space complexity
  • Practical purposePredicting scalability before running code

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How to read big O

Pick the dominant term, drop the constants. That's it.

If your algorithm runs in 3n² + 5n + 100 operations:

  • The term grows fastest as n increases — at n = 1,000, it dwarfs the others.
  • The constants (3, 5, 100) are implementation details — irrelevant at scale.
  • So the algorithm is O(n²).

This is asymptotic analysis: we care about how the algorithm behaves as n approaches infinity, not its exact behavior at small n. The whole point is to predict scalability before writing benchmark code — and to communicate tradeoffs in a single symbol.

Common growth rates, ranked

NotationNamen=10n=1,000n=1,000,000Example algorithm
O(1)Constant111Hash table lookup, array index
O(log n)Logarithmic31020Binary search, balanced BST
O(n)Linear101,0001,000,000Linear search, sum of array
O(n log n)Linearithmic3310,00020,000,000Merge sort, heap sort
O(n²)Quadratic1001,000,00010¹²Bubble sort, naive nested loop
O(2ⁿ)Exponential1,02410³⁰¹impossibleRecursive Fibonacci, brute-force subsets
O(n!)Factorial3.6MimpossibleimpossibleBrute-force traveling salesman

The crucial gap is between O(n²) and O(n log n). At a million items, O(n log n) finishes in 20 million operations (microseconds on modern hardware); O(n²) needs 10¹² operations (hours). That's the difference between an algorithm that scales to production data and one that doesn't.

The crucial gap above n²: anything exponential. O(2ⁿ) with n = 100 has more operations than there are atoms in the observable universe. If your algorithm is O(2ⁿ), you cannot scale it — you must find a polynomial alternative or accept that the input has to stay tiny.

Why we ignore constants and lower-order terms

Imagine two sorting algorithms:

  • Algorithm A: 100 × n operations
  • Algorithm B: 0.5 × n² operations

At n = 50, Algorithm A does 5,000 ops; Algorithm B does 1,250. B looks faster — and would be, on small inputs.

At n = 1,000, A does 100,000; B does 500,000. A wins.

At n = 1,000,000, A does 100 million; B does 500 billion. A is 5,000× faster.

Big O catches this — A is O(n), B is O(n²). The notation tells you the asymptotic winner, even when small-n behavior would mislead you. That's its job: predict scaling, not benchmark behavior on a specific input.

Big O vs theta vs omega

Three sibling notations describe different bounds:

  • O(f(n)) — "grows no faster than f(n)." Upper bound. Saying an algorithm is O(n²) is consistent with it actually being O(n) — the bound just isn't tight.
  • Ω(f(n)) — "grows at least as fast as f(n)." Lower bound. Useful for proving "no algorithm can solve this faster than..."
  • Θ(f(n)) — "grows exactly like f(n)." Both upper and lower bound — the tight one.

In casual usage, when programmers say "binary search is O(log n)," they almost always mean Θ(log n) — the tight bound. The looser O notation just dominates everyday conversation because it's easier to reason about and write.

How to derive big O for your code

A few rules of thumb:

  1. Sequential operations add — but you keep only the dominant term: O(n) + O(log n) = O(n).
  2. Nested loops multiply — two nested loops over the same array are O(n²); three nested loops are O(n³).
  3. Halving reduction is logarithmic — if each step cuts the problem in half (binary search, balanced BST traversal), it's O(log n).
  4. Recursion: count calls × work per call — merge sort has O(n) calls per level and log n levels, total O(n log n).
  5. Drop constantsO(2n) is just O(n). O(100) is just O(1).

When big O misleads you

The notation is asymptotic — it cares about behavior as n approaches infinity. Real programs care about behavior at the n you actually have. Big O misleads when:

  • Constant factors dominate at your input size. An O(n log n) algorithm with hidden constants of 1,000 can be slower than an O(n²) algorithm with constants of 1, until n exceeds some crossover point.
  • Cache effects warp the model. Two algorithms with the same big O can differ by 10× because of memory layout. Quicksort beats merge sort on most real hardware despite identical O(n log n) — purely because of cache locality.
  • Parallelism changes the picture. Big O assumes a sequential machine. On a GPU with thousands of cores, "O(n log n) sort" and "O(log² n) parallel sort" tell completely different stories.
  • Average-case differs from worst-case. Hash table lookups are O(1) average, O(n) worst-case. Whether the worst case ever fires depends on your hash function and your input distribution, not just on the bound.

The right mental model: big O tells you the shape of the curve. Benchmarks tell you where you are on it. You need both.

Frequently asked questions

What does the "O" in big O stand for?

"Order of." It comes from the German word "Ordnung," used by Paul Bachmann in 1894 and popularized by Donald Knuth. It means "of the order of" — i.e., on the same growth scale as.

Why do we ignore constants in big O?

Because they're implementation-dependent and irrelevant at scale. An O(n²) algorithm and an O(n) algorithm differ asymptotically — at large enough n, the linear one wins regardless of constant factors. Constants matter for engineering decisions on small inputs, but big O is about the shape of growth, not the absolute number.

What's the difference between O, Θ, and Ω?

O(f(n)) is an upper bound — "grows no faster than f(n)." Ω(f(n)) is a lower bound — "grows at least as fast as f(n)." Θ(f(n)) is both — "grows exactly like f(n)." In practice, most people say "O" when they technically mean "Θ," because the upper bound usually IS the tight bound.

Does big O always describe the worst case?

Not by definition — big O is just an upper bound. But by convention, when people say "binary search is O(log n)" without qualifying, they mean worst case. Average case (different from worst when inputs vary) and amortized cost (averaged over a sequence of operations) are separate analyses with their own notation.

When does big O lie?

When constant factors dominate at the input sizes you actually run on. An O(n log n) algorithm with huge constants can be slower than an O(n²) algorithm for n < 10,000. Cache effects, branch prediction, and memory layout also break the model. Big O is a guide, not a benchmark — measure on real data when performance actually matters.