Why is quickselect O(n) when quicksort is O(n log n)?

Quicksort recurses into both partitions; quickselect recurses into only the side that contains the kth element. With balanced partitions, the work series is n + n/2 + n/4 + ... = 2n. Quicksort processes both halves at each level, totaling n at every level for log n levels — hence n log n.

What is the median-of-medians algorithm?

A pivot-selection strategy that guarantees the chosen pivot lies in the middle 30-70% of the array. It partitions the array into groups of 5, finds each group's median, and recursively selects the median of those medians. This makes quickselect O(n) worst-case (not just expected) at the cost of higher constants.

Should I use a heap or quickselect for top-k?

Use a min-heap of size k for streaming data or k « n (e.g., top-10 from a billion-row stream). Time is O(n log k), space O(k). Use quickselect when you have the full array in memory and k is comparable to n — it runs in O(n) regardless of k, but rearranges the input in place.

What is the worst-case time of plain quickselect?

O(n²), achieved when every chosen pivot is the smallest or largest remaining element. With random or median-of-three pivots this is astronomically unlikely on real data, but adversarial inputs can trigger it. Median-of-medians eliminates the worst case at constant-factor cost.

Does quickselect modify the input array?

Yes. Like quicksort, it partitions in place. After it returns, the array is partially sorted: elements left of the kth are all ≤ it, elements right are all ≥ it, but neither side is sorted internally. If you need to preserve the input, copy it first.

How is quickselect used in real systems?

C++'s std::nth_element and Python's heapq.nlargest both use quickselect (introselect: quickselect with a heapsort fallback like introsort). Database query optimizers use it for percentile aggregations. NumPy's np.partition exposes it directly. The k-th order statistic is a primitive that shows up everywhere.

Quickselect — Find the kth Element in O(n)

How quickselect works

Quickselect is quicksort with one branch pruned. Quicksort partitions, then recurses into both halves. Quickselect partitions, then recurses into only the half that contains the k-th element. The other half is ignored entirely.

The algorithm:

Pick a pivot (last element, random, or median-of-three).
Partition: rearrange so all elements < pivot are on the left, ≥ pivot on the right. Let p = the pivot's final index.
If p == k: the pivot is the k-th element. Return it.
If p > k: the k-th must be on the left. Recurse on arr[lo..p-1].
If p < k: the k-th must be on the right. Recurse on arr[p+1..hi].

Why this is O(n): with a balanced pivot, work at each level halves. The sum n + n/2 + n/4 + ... + 1 converges to 2n. Quicksort, recursing into both halves, does n work at every level for log n levels.

Trace: find the 3rd smallest in [7, 2, 9, 4, 5, 1, 8, 6]

Indices are 0-based, k = 2 (looking for the element that ends up at index 2 in sorted order — the 3rd smallest). Sorted version is [1, 2, 4, 5, 6, 7, 8, 9], so the answer is 4.

Initial: [7, 2, 9, 4, 5, 1, 8, 6]   k = 2

Iter 1: pivot = arr[7] = 6 (Lomuto partition with last element)
  Walk through [7, 2, 9, 4, 5, 1, 8]; place < 6 on left.
  After partition: [2, 4, 5, 1, 6, 9, 8, 7]
                            ^pivot at index 4
  p = 4 > k = 2 → recurse left [2, 4, 5, 1]

Iter 2: pivot = arr[3] = 1
  After partition: [1, 4, 5, 2]
                    ^pivot at index 0
  p = 0 < k = 2 → recurse right [4, 5, 2] (indices 1..3)

Iter 3: pivot = arr[3] = 2
  After partition: [2, 4, 5] in indices 1..3
                    ^pivot at index 1
  p = 1 < k = 2 → recurse right [4, 5] (indices 2..3)

Iter 4: pivot = arr[3] = 5
  After partition: [4, 5] in indices 2..3
                    ^pivot at index 3
  p = 3 > k = 2 → recurse left [4] (index 2..2)

Iter 5: single element at index 2 → return arr[2] = 4 ✓

Five partition passes, total work roughly n + n/2 + n/4 + n/8 + 1 = 2n. The other half of the array (originally indices 4-7) was touched once during partition and then ignored.

When to use quickselect

Median or percentile from a static array. The 50th percentile is k = n/2; the 95th percentile is k = 0.95n. Quickselect computes either in O(n) instead of sorting the whole array in O(n log n).
Top-k when k is comparable to n. For top-1000 from a million elements, quickselect at O(n) beats heap-based top-k at O(n log k) ≈ 10n.
You're allowed to mutate the input. Quickselect partitions in place; the array ends up partially sorted around the k-th element.
The data fits in memory. Quickselect needs random access. For streaming data, a min-heap of size k is the better fit.

If your data is sorted, just index directly — k-th element is arr[k]. If your data is in a sorted data structure (skip list, balanced BST), use that structure's order-statistic operation.

Quickselect vs alternatives for "find the k-th"

	Quickselect	Sort + index	Min-heap (size k)	Median-of-medians
Time (avg)	O(n)	O(n log n)	O(n log k)	O(n)
Time (worst)	O(n²)	O(n log n)	O(n log k)	O(n)
Space	O(log n)	O(1) or O(n)	O(k)	O(log n)
Streaming-friendly	No	No	Yes	No
Modifies input	Yes	Yes (if in-place)	No	Yes
Returns k items?	Just the boundary	All sorted	The k smallest	Just the boundary
Constant factor	Low (~3-4)	~10	~5	~10-25
Best for	k-th element, mutate OK	Need full sorted output	Streaming top-k	Adversarial inputs

For typical use cases (median of an in-memory array), plain quickselect with a random pivot is the right answer. For top-k streaming, the heap. For adversarial guarantees (cryptography, real-time systems), median-of-medians.

What "linear" means in nanoseconds

Quickselect's inner partition step touches each element once and runs at near-memory-bandwidth speed — about 2-4 ns per element on modern hardware:

n	Quickselect (avg)	Sort + index	Speedup
1,000	~5 µs	~50 µs	10×
10,000	~50 µs	~700 µs	14×
100,000	~500 µs	~9 ms	18×
1,000,000	~5 ms	~120 ms	24×
10,000,000	~50 ms	~1.5 s	30×

The speedup grows with n because sort is O(n log n) and quickselect is O(n) — the log n factor matters. At n = 10⁷, log₂ n ≈ 23, and the constant gap closes the 30× difference. Practical numbers from std::nth_element vs std::sort on random integers match this pattern within ~20%.

JavaScript implementation

// Find the k-th smallest element (0-indexed) in arr.
// Mutates arr. Returns the value.
function quickselect(arr, k, lo = 0, hi = arr.length - 1) {
  while (lo < hi) {
    // Median-of-three pivot for stability against sorted/reverse input
    const mid = (lo + hi) >>> 1;
    if (arr[lo] > arr[mid]) [arr[lo], arr[mid]] = [arr[mid], arr[lo]];
    if (arr[lo] > arr[hi])  [arr[lo], arr[hi]]  = [arr[hi], arr[lo]];
    if (arr[mid] > arr[hi]) [arr[mid], arr[hi]] = [arr[hi], arr[mid]];
    [arr[mid], arr[hi - 1]] = [arr[hi - 1], arr[mid]];
    const pivot = arr[hi - 1];

    let i = lo, j = hi - 1;
    while (true) {
      while (arr[++i] < pivot) {}
      while (arr[--j] > pivot) {}
      if (i >= j) break;
      [arr[i], arr[j]] = [arr[j], arr[i]];
    }
    [arr[i], arr[hi - 1]] = [arr[hi - 1], arr[i]];

    if (i === k) return arr[i];
    if (i < k) lo = i + 1;
    else hi = i - 1;
  }
  return arr[lo];
}

// Examples
const a = [7, 2, 9, 4, 5, 1, 8, 6];
quickselect([...a], 0);    // 1 (smallest)
quickselect([...a], 3);    // 5 (4th smallest = median for n=8 → lower)
quickselect([...a], 7);    // 9 (largest)

The loop is iterative — no recursion, so no stack overflow on adversarial inputs. After the call returns, the array is partially sorted: arr[0..k-1] all ≤ result, arr[k+1..n-1] all ≥ result.

Famous problem: k-th largest element

// LeetCode 215: find the k-th largest in an unsorted array.
// k-th largest = (n-k)-th smallest in 0-indexed terms.
function findKthLargest(nums, k) {
  return quickselect(nums.slice(), nums.length - k);
}

findKthLargest([3, 2, 1, 5, 6, 4], 2);  // 5 (2nd largest)
findKthLargest([3, 2, 3, 1, 2, 4, 5, 5, 6], 4);  // 4 (4th largest)

The trick is the index translation: in a sorted array of length n, the k-th largest sits at index n - k. Quickselect with that target gives you the answer in O(n) expected time.

Famous problem: median of two sorted arrays

Quickselect is overkill if both arrays are already sorted (there's an O(log(min(m, n))) algorithm), but on a single unsorted array, finding the median is the canonical use case:

function median(arr) {
  const a = arr.slice();
  const n = a.length;
  if (n & 1) {
    return quickselect(a, n >> 1);  // odd n: middle element
  }
  // Even n: average of n/2 - 1 and n/2 elements
  const lo = quickselect(a, (n >> 1) - 1);
  const hi = quickselect(a, n >> 1);  // a is partitioned, so quickselect on right half
  return (lo + hi) / 2;
}

Python implementation

import random

def quickselect(arr, k, lo=0, hi=None):
    if hi is None:
        hi = len(arr) - 1
    while lo < hi:
        # Random pivot — avoids worst case on sorted input
        pivot_idx = random.randint(lo, hi)
        arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
        pivot = arr[hi]

        # Lomuto partition
        store = lo
        for i in range(lo, hi):
            if arr[i] < pivot:
                arr[i], arr[store] = arr[store], arr[i]
                store += 1
        arr[store], arr[hi] = arr[hi], arr[store]

        if store == k:
            return arr[store]
        elif store < k:
            lo = store + 1
        else:
            hi = store - 1
    return arr[lo]

# Production: just use the standard library
import heapq
heapq.nlargest(3, [3, 2, 1, 5, 6, 4])   # [6, 5, 4]
heapq.nsmallest(2, [3, 2, 1, 5, 6, 4])  # [1, 2]

# numpy.partition is the most direct equivalent of quickselect
import numpy as np
a = np.array([7, 2, 9, 4, 5, 1, 8, 6])
np.partition(a, 2)[2]  # 3rd smallest = 4

Median-of-medians: O(n) worst case

Plain quickselect has O(n²) worst case when pivots are pathologically bad. Median-of-medians (Blum-Floyd-Pratt-Rivest-Tarjan, 1973) eliminates this by choosing a pivot that is provably "good enough":

def median_of_medians_select(arr, k):
    if len(arr) <= 5:
        return sorted(arr)[k]

    # 1. Split into groups of 5
    chunks = [arr[i:i+5] for i in range(0, len(arr), 5)]
    # 2. Sort each group, take its median
    medians = [sorted(c)[len(c) // 2] for c in chunks]
    # 3. Recursively find the median of medians
    pivot = median_of_medians_select(medians, len(medians) // 2)

    # 4. Partition by that pivot
    lows  = [x for x in arr if x < pivot]
    highs = [x for x in arr if x > pivot]
    pivot_count = len(arr) - len(lows) - len(highs)

    if k < len(lows):
        return median_of_medians_select(lows, k)
    elif k < len(lows) + pivot_count:
        return pivot
    else:
        return median_of_medians_select(highs, k - len(lows) - pivot_count)

The key claim: the median-of-medians pivot is greater than at least 30% of the elements and less than at least 30%. So the recursive call shrinks the input to at most 70% of its size. The recurrence T(n) ≤ T(n/5) + T(7n/10) + O(n) solves to T(n) = O(n).

The constants are bigger than plain quickselect (~10-25× the work for the same n), so use it only when you need the worst-case guarantee. C++'s std::nth_element uses introselect: random-pivot quickselect with median-of-medians as fallback after too many bad pivots.

Common bugs and edge cases

Off-by-one with k. "k-th smallest" usually means 1-indexed (k=1 → minimum), but the array index is 0-based. Pick a convention and be ruthless about it; mixing 0-indexed and 1-indexed in the same codebase produces silent bugs.
Using last-element pivot on sorted input. Plain Lomuto partition with last-element pivot is O(n²) on sorted or reverse-sorted arrays — every pivot is the maximum or minimum of its subarray. Always randomize or use median-of-three.
Forgetting to mutate-or-copy. Quickselect rearranges the array. If you'll need the original later, slice/copy first or your "input" will silently become the partial sort.
Equal-element pile-up. If the array contains many duplicates and the pivot is one of them, simple two-way partition can degrade to O(n²). Use three-way (Dutch national flag) partitioning to handle duplicates in linear time.
Recursive blowup on adversarial inputs. Plain quickselect on n=10⁵ with worst-case pivots can recurse 10⁵ deep — exceeds Python's recursion limit and JS's stack. Use the iterative form or median-of-medians.
Treating quickselect output as sorted. The k-th element is in place, but the array is not sorted. arr[k+1] is some element ≥ arr[k], not the (k+1)-th smallest.

Quickselect

Interactive visualization

Watch the 60-second explainer

How quickselect works

Trace: find the 3rd smallest in [7, 2, 9, 4, 5, 1, 8, 6]

When to use quickselect

Quickselect vs alternatives for "find the k-th"

What "linear" means in nanoseconds

JavaScript implementation

Famous problem: k-th largest element

Famous problem: median of two sorted arrays

Python implementation

Median-of-medians: O(n) worst case

Common bugs and edge cases

Frequently asked questions

Interactive visualization

Watch the 60-second explainer

How quickselect works

Trace: find the 3rd smallest in [7, 2, 9, 4, 5, 1, 8, 6]

When to use quickselect

Quickselect vs alternatives for "find the k-th"

What "linear" means in nanoseconds

JavaScript implementation

Famous problem: k-th largest element

Famous problem: median of two sorted arrays

Python implementation

Median-of-medians: O(n) worst case

Common bugs and edge cases

Frequently asked questions

Related concepts