What does the "O" in big O stand for?

"Order of." It comes from the German word "Ordnung," used by Paul Bachmann in 1894 and popularized by Donald Knuth. It means "of the order of" — i.e., on the same growth scale as.

Why do we ignore constants in big O?

Because they're implementation-dependent and irrelevant at scale. An O(n²) algorithm and an O(n) algorithm differ asymptotically — at large enough n, the linear one wins regardless of constant factors. Constants matter for engineering decisions on small inputs, but big O is about the shape of growth, not the absolute number.

What's the difference between O, Θ, and Ω?

O(f(n)) is an upper bound — "grows no faster than f(n)." Ω(f(n)) is a lower bound — "grows at least as fast as f(n)." Θ(f(n)) is both — "grows exactly like f(n)." In practice, most people say "O" when they technically mean "Θ," because the upper bound usually IS the tight bound.

Does big O always describe the worst case?

Not by definition — big O is just an upper bound. But by convention, when people say "binary search is O(log n)" without qualifying, they mean worst case. Average case (different from worst when inputs vary) and amortized cost (averaged over a sequence of operations) are separate analyses with their own notation.

When constant factors dominate at the input sizes you actually run on. An O(n log n) algorithm with huge constants can be slower than an O(n²) algorithm for n < 10,000. Cache effects, branch prediction, and memory layout also break the model. Big O is a guide, not a benchmark — measure on real data when performance actually matters.

Big O Notation — How It Works, Complexity & Code

How to read big O

Pick the dominant term, drop the constants. That's it.

If your algorithm runs in 3n² + 5n + 100 operations:

The n² term grows fastest as n increases — at n = 1,000, it dwarfs the others.
The constants (3, 5, 100) are implementation details — irrelevant at scale.
So the algorithm is O(n²).

This is asymptotic analysis: we care about how the algorithm behaves as n approaches infinity, not its exact behavior at small n. The whole point is to predict scalability before writing benchmark code — and to communicate tradeoffs in a single symbol.

Common growth rates, ranked

Notation	Name	n=10	n=1,000	n=1,000,000	Example algorithm
O(1)	Constant	1	1	1	Hash table lookup, array index
O(log n)	Logarithmic	3	10	20	Binary search, balanced BST
O(n)	Linear	10	1,000	1,000,000	Linear search, sum of array
O(n log n)	Linearithmic	33	10,000	20,000,000	Merge sort, heap sort
O(n²)	Quadratic	100	1,000,000	10¹²	Bubble sort, naive nested loop
O(2ⁿ)	Exponential	1,024	10³⁰¹	impossible	Recursive Fibonacci, brute-force subsets
O(n!)	Factorial	3.6M	impossible	impossible	Brute-force traveling salesman

The crucial gap is between O(n²) and O(n log n). At a million items, O(n log n) finishes in 20 million operations (microseconds on modern hardware); O(n²) needs 10¹² operations (hours). That's the difference between an algorithm that scales to production data and one that doesn't.

The crucial gap above n²: anything exponential. O(2ⁿ) with n = 100 has more operations than there are atoms in the observable universe. If your algorithm is O(2ⁿ), you cannot scale it — you must find a polynomial alternative or accept that the input has to stay tiny.

Why we ignore constants and lower-order terms

Imagine two sorting algorithms:

Algorithm A: 100 × n operations
Algorithm B: 0.5 × n² operations

At n = 50, Algorithm A does 5,000 ops; Algorithm B does 1,250. B looks faster — and would be, on small inputs.

At n = 1,000, A does 100,000; B does 500,000. A wins.

At n = 1,000,000, A does 100 million; B does 500 billion. A is 5,000× faster.

Big O catches this — A is O(n), B is O(n²). The notation tells you the asymptotic winner, even when small-n behavior would mislead you. That's its job: predict scaling, not benchmark behavior on a specific input.

Big O vs theta vs omega

Three sibling notations describe different bounds:

O(f(n)) — "grows no faster than f(n)." Upper bound. Saying an algorithm is O(n²) is consistent with it actually being O(n) — the bound just isn't tight.
Ω(f(n)) — "grows at least as fast as f(n)." Lower bound. Useful for proving "no algorithm can solve this faster than..."
Θ(f(n)) — "grows exactly like f(n)." Both upper and lower bound — the tight one.

In casual usage, when programmers say "binary search is O(log n)," they almost always mean Θ(log n) — the tight bound. The looser O notation just dominates everyday conversation because it's easier to reason about and write.

How to derive big O for your code

A few rules of thumb:

Sequential operations add — but you keep only the dominant term: O(n) + O(log n) = O(n).
Nested loops multiply — two nested loops over the same array are O(n²); three nested loops are O(n³).
Halving reduction is logarithmic — if each step cuts the problem in half (binary search, balanced BST traversal), it's O(log n).
Recursion: count calls × work per call — merge sort has O(n) calls per level and log n levels, total O(n log n).
Drop constants — O(2n) is just O(n). O(100) is just O(1).

When big O misleads you

The notation is asymptotic — it cares about behavior as n approaches infinity. Real programs care about behavior at the n you actually have. Big O misleads when:

Constant factors dominate at your input size. An O(n log n) algorithm with hidden constants of 1,000 can be slower than an O(n²) algorithm with constants of 1, until n exceeds some crossover point.
Cache effects warp the model. Two algorithms with the same big O can differ by 10× because of memory layout. Quicksort beats merge sort on most real hardware despite identical O(n log n) — purely because of cache locality.
Parallelism changes the picture. Big O assumes a sequential machine. On a GPU with thousands of cores, "O(n log n) sort" and "O(log² n) parallel sort" tell completely different stories.
Average-case differs from worst-case. Hash table lookups are O(1) average, O(n) worst-case. Whether the worst case ever fires depends on your hash function and your input distribution, not just on the bound.

The right mental model: big O tells you the shape of the curve. Benchmarks tell you where you are on it. You need both.

Big O Notation

Interactive visualization

Watch the 60-second explainer

How to read big O

Common growth rates, ranked

Why we ignore constants and lower-order terms

Big O vs theta vs omega

How to derive big O for your code

When big O misleads you

Frequently asked questions

Interactive visualization

Watch the 60-second explainer

How to read big O

Common growth rates, ranked

Why we ignore constants and lower-order terms

Big O vs theta vs omega

How to derive big O for your code

When big O misleads you

Frequently asked questions

Related concepts