Algorithms

Quickselect

Find the k-th element in linear time without sorting the whole array

Quickselect finds the k-th smallest element in an unsorted array in O(n) expected time without sorting the whole array. It's quicksort's partition step run on only one side — and the basis of every fast median, top-k, and percentile algorithm.

  • Time (best, average)O(n)
  • Time (worst, plain)O(n²)
  • Time (median-of-medians)O(n) guaranteed
  • SpaceO(log n) stack
  • In-placeYes (modifies input)
  • StableNo

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How quickselect works

Quickselect is quicksort with one branch pruned. Quicksort partitions, then recurses into both halves. Quickselect partitions, then recurses into only the half that contains the k-th element. The other half is ignored entirely.

The algorithm:

  1. Pick a pivot (last element, random, or median-of-three).
  2. Partition: rearrange so all elements < pivot are on the left, ≥ pivot on the right. Let p = the pivot's final index.
  3. If p == k: the pivot is the k-th element. Return it.
  4. If p > k: the k-th must be on the left. Recurse on arr[lo..p-1].
  5. If p < k: the k-th must be on the right. Recurse on arr[p+1..hi].

Why this is O(n): with a balanced pivot, work at each level halves. The sum n + n/2 + n/4 + ... + 1 converges to 2n. Quicksort, recursing into both halves, does n work at every level for log n levels.

Trace: find the 3rd smallest in [7, 2, 9, 4, 5, 1, 8, 6]

Indices are 0-based, k = 2 (looking for the element that ends up at index 2 in sorted order — the 3rd smallest). Sorted version is [1, 2, 4, 5, 6, 7, 8, 9], so the answer is 4.

Initial: [7, 2, 9, 4, 5, 1, 8, 6]   k = 2

Iter 1: pivot = arr[7] = 6 (Lomuto partition with last element)
  Walk through [7, 2, 9, 4, 5, 1, 8]; place < 6 on left.
  After partition: [2, 4, 5, 1, 6, 9, 8, 7]
                            ^pivot at index 4
  p = 4 > k = 2 → recurse left [2, 4, 5, 1]

Iter 2: pivot = arr[3] = 1
  After partition: [1, 4, 5, 2]
                    ^pivot at index 0
  p = 0 < k = 2 → recurse right [4, 5, 2] (indices 1..3)

Iter 3: pivot = arr[3] = 2
  After partition: [2, 4, 5] in indices 1..3
                    ^pivot at index 1
  p = 1 < k = 2 → recurse right [4, 5] (indices 2..3)

Iter 4: pivot = arr[3] = 5
  After partition: [4, 5] in indices 2..3
                    ^pivot at index 3
  p = 3 > k = 2 → recurse left [4] (index 2..2)

Iter 5: single element at index 2 → return arr[2] = 4 ✓

Five partition passes, total work roughly n + n/2 + n/4 + n/8 + 1 = 2n. The other half of the array (originally indices 4-7) was touched once during partition and then ignored.

When to use quickselect

  • Median or percentile from a static array. The 50th percentile is k = n/2; the 95th percentile is k = 0.95n. Quickselect computes either in O(n) instead of sorting the whole array in O(n log n).
  • Top-k when k is comparable to n. For top-1000 from a million elements, quickselect at O(n) beats heap-based top-k at O(n log k) ≈ 10n.
  • You're allowed to mutate the input. Quickselect partitions in place; the array ends up partially sorted around the k-th element.
  • The data fits in memory. Quickselect needs random access. For streaming data, a min-heap of size k is the better fit.

If your data is sorted, just index directly — k-th element is arr[k]. If your data is in a sorted data structure (skip list, balanced BST), use that structure's order-statistic operation.

Quickselect vs alternatives for "find the k-th"

QuickselectSort + indexMin-heap (size k)Median-of-medians
Time (avg)O(n)O(n log n)O(n log k)O(n)
Time (worst)O(n²)O(n log n)O(n log k)O(n)
SpaceO(log n)O(1) or O(n)O(k)O(log n)
Streaming-friendlyNoNoYesNo
Modifies inputYesYes (if in-place)NoYes
Returns k items?Just the boundaryAll sortedThe k smallestJust the boundary
Constant factorLow (~3-4)~10~5~10-25
Best fork-th element, mutate OKNeed full sorted outputStreaming top-kAdversarial inputs

For typical use cases (median of an in-memory array), plain quickselect with a random pivot is the right answer. For top-k streaming, the heap. For adversarial guarantees (cryptography, real-time systems), median-of-medians.

What "linear" means in nanoseconds

Quickselect's inner partition step touches each element once and runs at near-memory-bandwidth speed — about 2-4 ns per element on modern hardware:

nQuickselect (avg)Sort + indexSpeedup
1,000~5 µs~50 µs10×
10,000~50 µs~700 µs14×
100,000~500 µs~9 ms18×
1,000,000~5 ms~120 ms24×
10,000,000~50 ms~1.5 s30×

The speedup grows with n because sort is O(n log n) and quickselect is O(n) — the log n factor matters. At n = 10⁷, log₂ n ≈ 23, and the constant gap closes the 30× difference. Practical numbers from std::nth_element vs std::sort on random integers match this pattern within ~20%.

JavaScript implementation

// Find the k-th smallest element (0-indexed) in arr.
// Mutates arr. Returns the value.
function quickselect(arr, k, lo = 0, hi = arr.length - 1) {
  while (lo < hi) {
    // Median-of-three pivot for stability against sorted/reverse input
    const mid = (lo + hi) >>> 1;
    if (arr[lo] > arr[mid]) [arr[lo], arr[mid]] = [arr[mid], arr[lo]];
    if (arr[lo] > arr[hi])  [arr[lo], arr[hi]]  = [arr[hi], arr[lo]];
    if (arr[mid] > arr[hi]) [arr[mid], arr[hi]] = [arr[hi], arr[mid]];
    [arr[mid], arr[hi - 1]] = [arr[hi - 1], arr[mid]];
    const pivot = arr[hi - 1];

    let i = lo, j = hi - 1;
    while (true) {
      while (arr[++i] < pivot) {}
      while (arr[--j] > pivot) {}
      if (i >= j) break;
      [arr[i], arr[j]] = [arr[j], arr[i]];
    }
    [arr[i], arr[hi - 1]] = [arr[hi - 1], arr[i]];

    if (i === k) return arr[i];
    if (i < k) lo = i + 1;
    else hi = i - 1;
  }
  return arr[lo];
}

// Examples
const a = [7, 2, 9, 4, 5, 1, 8, 6];
quickselect([...a], 0);    // 1 (smallest)
quickselect([...a], 3);    // 5 (4th smallest = median for n=8 → lower)
quickselect([...a], 7);    // 9 (largest)

The loop is iterative — no recursion, so no stack overflow on adversarial inputs. After the call returns, the array is partially sorted: arr[0..k-1] all ≤ result, arr[k+1..n-1] all ≥ result.

Famous problem: k-th largest element

// LeetCode 215: find the k-th largest in an unsorted array.
// k-th largest = (n-k)-th smallest in 0-indexed terms.
function findKthLargest(nums, k) {
  return quickselect(nums.slice(), nums.length - k);
}

findKthLargest([3, 2, 1, 5, 6, 4], 2);  // 5 (2nd largest)
findKthLargest([3, 2, 3, 1, 2, 4, 5, 5, 6], 4);  // 4 (4th largest)

The trick is the index translation: in a sorted array of length n, the k-th largest sits at index n - k. Quickselect with that target gives you the answer in O(n) expected time.

Famous problem: median of two sorted arrays

Quickselect is overkill if both arrays are already sorted (there's an O(log(min(m, n))) algorithm), but on a single unsorted array, finding the median is the canonical use case:

function median(arr) {
  const a = arr.slice();
  const n = a.length;
  if (n & 1) {
    return quickselect(a, n >> 1);  // odd n: middle element
  }
  // Even n: average of n/2 - 1 and n/2 elements
  const lo = quickselect(a, (n >> 1) - 1);
  const hi = quickselect(a, n >> 1);  // a is partitioned, so quickselect on right half
  return (lo + hi) / 2;
}

Python implementation

import random

def quickselect(arr, k, lo=0, hi=None):
    if hi is None:
        hi = len(arr) - 1
    while lo < hi:
        # Random pivot — avoids worst case on sorted input
        pivot_idx = random.randint(lo, hi)
        arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
        pivot = arr[hi]

        # Lomuto partition
        store = lo
        for i in range(lo, hi):
            if arr[i] < pivot:
                arr[i], arr[store] = arr[store], arr[i]
                store += 1
        arr[store], arr[hi] = arr[hi], arr[store]

        if store == k:
            return arr[store]
        elif store < k:
            lo = store + 1
        else:
            hi = store - 1
    return arr[lo]

# Production: just use the standard library
import heapq
heapq.nlargest(3, [3, 2, 1, 5, 6, 4])   # [6, 5, 4]
heapq.nsmallest(2, [3, 2, 1, 5, 6, 4])  # [1, 2]

# numpy.partition is the most direct equivalent of quickselect
import numpy as np
a = np.array([7, 2, 9, 4, 5, 1, 8, 6])
np.partition(a, 2)[2]  # 3rd smallest = 4

Median-of-medians: O(n) worst case

Plain quickselect has O(n²) worst case when pivots are pathologically bad. Median-of-medians (Blum-Floyd-Pratt-Rivest-Tarjan, 1973) eliminates this by choosing a pivot that is provably "good enough":

def median_of_medians_select(arr, k):
    if len(arr) <= 5:
        return sorted(arr)[k]

    # 1. Split into groups of 5
    chunks = [arr[i:i+5] for i in range(0, len(arr), 5)]
    # 2. Sort each group, take its median
    medians = [sorted(c)[len(c) // 2] for c in chunks]
    # 3. Recursively find the median of medians
    pivot = median_of_medians_select(medians, len(medians) // 2)

    # 4. Partition by that pivot
    lows  = [x for x in arr if x < pivot]
    highs = [x for x in arr if x > pivot]
    pivot_count = len(arr) - len(lows) - len(highs)

    if k < len(lows):
        return median_of_medians_select(lows, k)
    elif k < len(lows) + pivot_count:
        return pivot
    else:
        return median_of_medians_select(highs, k - len(lows) - pivot_count)

The key claim: the median-of-medians pivot is greater than at least 30% of the elements and less than at least 30%. So the recursive call shrinks the input to at most 70% of its size. The recurrence T(n) ≤ T(n/5) + T(7n/10) + O(n) solves to T(n) = O(n).

The constants are bigger than plain quickselect (~10-25× the work for the same n), so use it only when you need the worst-case guarantee. C++'s std::nth_element uses introselect: random-pivot quickselect with median-of-medians as fallback after too many bad pivots.

Common bugs and edge cases

  • Off-by-one with k. "k-th smallest" usually means 1-indexed (k=1 → minimum), but the array index is 0-based. Pick a convention and be ruthless about it; mixing 0-indexed and 1-indexed in the same codebase produces silent bugs.
  • Using last-element pivot on sorted input. Plain Lomuto partition with last-element pivot is O(n²) on sorted or reverse-sorted arrays — every pivot is the maximum or minimum of its subarray. Always randomize or use median-of-three.
  • Forgetting to mutate-or-copy. Quickselect rearranges the array. If you'll need the original later, slice/copy first or your "input" will silently become the partial sort.
  • Equal-element pile-up. If the array contains many duplicates and the pivot is one of them, simple two-way partition can degrade to O(n²). Use three-way (Dutch national flag) partitioning to handle duplicates in linear time.
  • Recursive blowup on adversarial inputs. Plain quickselect on n=10⁵ with worst-case pivots can recurse 10⁵ deep — exceeds Python's recursion limit and JS's stack. Use the iterative form or median-of-medians.
  • Treating quickselect output as sorted. The k-th element is in place, but the array is not sorted. arr[k+1] is some element ≥ arr[k], not the (k+1)-th smallest.

Frequently asked questions

Why is quickselect O(n) when quicksort is O(n log n)?

Quicksort recurses into both partitions; quickselect recurses into only the side that contains the kth element. With balanced partitions, the work series is n + n/2 + n/4 + ... = 2n. Quicksort processes both halves at each level, totaling n at every level for log n levels — hence n log n.

What is the median-of-medians algorithm?

A pivot-selection strategy that guarantees the chosen pivot lies in the middle 30-70% of the array. It partitions the array into groups of 5, finds each group's median, and recursively selects the median of those medians. This makes quickselect O(n) worst-case (not just expected) at the cost of higher constants.

Should I use a heap or quickselect for top-k?

Use a min-heap of size k for streaming data or k « n (e.g., top-10 from a billion-row stream). Time is O(n log k), space O(k). Use quickselect when you have the full array in memory and k is comparable to n — it runs in O(n) regardless of k, but rearranges the input in place.

What is the worst-case time of plain quickselect?

O(n²), achieved when every chosen pivot is the smallest or largest remaining element. With random or median-of-three pivots this is astronomically unlikely on real data, but adversarial inputs can trigger it. Median-of-medians eliminates the worst case at constant-factor cost.

Does quickselect modify the input array?

Yes. Like quicksort, it partitions in place. After it returns, the array is partially sorted: elements left of the kth are all ≤ it, elements right are all ≥ it, but neither side is sorted internally. If you need to preserve the input, copy it first.

How is quickselect used in real systems?

C++'s std::nth_element and Python's heapq.nlargest both use quickselect (introselect: quickselect with a heapsort fallback like introsort). Database query optimizers use it for percentile aggregations. NumPy's np.partition exposes it directly. The k-th order statistic is a primitive that shows up everywhere.