Algorithms
Quickselect
Find the k-th element in linear time without sorting the whole array
Quickselect finds the k-th smallest element in an unsorted array in O(n) expected time without sorting the whole array. It's quicksort's partition step run on only one side — and the basis of every fast median, top-k, and percentile algorithm.
- Time (best, average)O(n)
- Time (worst, plain)O(n²)
- Time (median-of-medians)O(n) guaranteed
- SpaceO(log n) stack
- In-placeYes (modifies input)
- StableNo
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How quickselect works
Quickselect is quicksort with one branch pruned. Quicksort partitions, then recurses into both halves. Quickselect partitions, then recurses into only the half that contains the k-th element. The other half is ignored entirely.
The algorithm:
- Pick a pivot (last element, random, or median-of-three).
- Partition: rearrange so all elements < pivot are on the left, ≥ pivot on the right. Let
p= the pivot's final index. - If
p == k: the pivot is the k-th element. Return it. - If
p > k: the k-th must be on the left. Recurse onarr[lo..p-1]. - If
p < k: the k-th must be on the right. Recurse onarr[p+1..hi].
Why this is O(n): with a balanced pivot, work at each level halves. The sum n + n/2 + n/4 + ... + 1 converges to 2n. Quicksort, recursing into both halves, does n work at every level for log n levels.
Trace: find the 3rd smallest in [7, 2, 9, 4, 5, 1, 8, 6]
Indices are 0-based, k = 2 (looking for the element that ends up at index 2 in sorted order — the 3rd smallest). Sorted version is [1, 2, 4, 5, 6, 7, 8, 9], so the answer is 4.
Initial: [7, 2, 9, 4, 5, 1, 8, 6] k = 2
Iter 1: pivot = arr[7] = 6 (Lomuto partition with last element)
Walk through [7, 2, 9, 4, 5, 1, 8]; place < 6 on left.
After partition: [2, 4, 5, 1, 6, 9, 8, 7]
^pivot at index 4
p = 4 > k = 2 → recurse left [2, 4, 5, 1]
Iter 2: pivot = arr[3] = 1
After partition: [1, 4, 5, 2]
^pivot at index 0
p = 0 < k = 2 → recurse right [4, 5, 2] (indices 1..3)
Iter 3: pivot = arr[3] = 2
After partition: [2, 4, 5] in indices 1..3
^pivot at index 1
p = 1 < k = 2 → recurse right [4, 5] (indices 2..3)
Iter 4: pivot = arr[3] = 5
After partition: [4, 5] in indices 2..3
^pivot at index 3
p = 3 > k = 2 → recurse left [4] (index 2..2)
Iter 5: single element at index 2 → return arr[2] = 4 ✓
Five partition passes, total work roughly n + n/2 + n/4 + n/8 + 1 = 2n. The other half of the array (originally indices 4-7) was touched once during partition and then ignored.
When to use quickselect
- Median or percentile from a static array. The 50th percentile is k = n/2; the 95th percentile is k = 0.95n. Quickselect computes either in O(n) instead of sorting the whole array in O(n log n).
- Top-k when k is comparable to n. For top-1000 from a million elements, quickselect at O(n) beats heap-based top-k at O(n log k) ≈ 10n.
- You're allowed to mutate the input. Quickselect partitions in place; the array ends up partially sorted around the k-th element.
- The data fits in memory. Quickselect needs random access. For streaming data, a min-heap of size k is the better fit.
If your data is sorted, just index directly — k-th element is arr[k]. If your data is in a sorted data structure (skip list, balanced BST), use that structure's order-statistic operation.
Quickselect vs alternatives for "find the k-th"
| Quickselect | Sort + index | Min-heap (size k) | Median-of-medians | |
|---|---|---|---|---|
| Time (avg) | O(n) | O(n log n) | O(n log k) | O(n) |
| Time (worst) | O(n²) | O(n log n) | O(n log k) | O(n) |
| Space | O(log n) | O(1) or O(n) | O(k) | O(log n) |
| Streaming-friendly | No | No | Yes | No |
| Modifies input | Yes | Yes (if in-place) | No | Yes |
| Returns k items? | Just the boundary | All sorted | The k smallest | Just the boundary |
| Constant factor | Low (~3-4) | ~10 | ~5 | ~10-25 |
| Best for | k-th element, mutate OK | Need full sorted output | Streaming top-k | Adversarial inputs |
For typical use cases (median of an in-memory array), plain quickselect with a random pivot is the right answer. For top-k streaming, the heap. For adversarial guarantees (cryptography, real-time systems), median-of-medians.
What "linear" means in nanoseconds
Quickselect's inner partition step touches each element once and runs at near-memory-bandwidth speed — about 2-4 ns per element on modern hardware:
| n | Quickselect (avg) | Sort + index | Speedup |
|---|---|---|---|
| 1,000 | ~5 µs | ~50 µs | 10× |
| 10,000 | ~50 µs | ~700 µs | 14× |
| 100,000 | ~500 µs | ~9 ms | 18× |
| 1,000,000 | ~5 ms | ~120 ms | 24× |
| 10,000,000 | ~50 ms | ~1.5 s | 30× |
The speedup grows with n because sort is O(n log n) and quickselect is O(n) — the log n factor matters. At n = 10⁷, log₂ n ≈ 23, and the constant gap closes the 30× difference. Practical numbers from std::nth_element vs std::sort on random integers match this pattern within ~20%.
JavaScript implementation
// Find the k-th smallest element (0-indexed) in arr.
// Mutates arr. Returns the value.
function quickselect(arr, k, lo = 0, hi = arr.length - 1) {
while (lo < hi) {
// Median-of-three pivot for stability against sorted/reverse input
const mid = (lo + hi) >>> 1;
if (arr[lo] > arr[mid]) [arr[lo], arr[mid]] = [arr[mid], arr[lo]];
if (arr[lo] > arr[hi]) [arr[lo], arr[hi]] = [arr[hi], arr[lo]];
if (arr[mid] > arr[hi]) [arr[mid], arr[hi]] = [arr[hi], arr[mid]];
[arr[mid], arr[hi - 1]] = [arr[hi - 1], arr[mid]];
const pivot = arr[hi - 1];
let i = lo, j = hi - 1;
while (true) {
while (arr[++i] < pivot) {}
while (arr[--j] > pivot) {}
if (i >= j) break;
[arr[i], arr[j]] = [arr[j], arr[i]];
}
[arr[i], arr[hi - 1]] = [arr[hi - 1], arr[i]];
if (i === k) return arr[i];
if (i < k) lo = i + 1;
else hi = i - 1;
}
return arr[lo];
}
// Examples
const a = [7, 2, 9, 4, 5, 1, 8, 6];
quickselect([...a], 0); // 1 (smallest)
quickselect([...a], 3); // 5 (4th smallest = median for n=8 → lower)
quickselect([...a], 7); // 9 (largest)
The loop is iterative — no recursion, so no stack overflow on adversarial inputs. After the call returns, the array is partially sorted: arr[0..k-1] all ≤ result, arr[k+1..n-1] all ≥ result.
Famous problem: k-th largest element
// LeetCode 215: find the k-th largest in an unsorted array.
// k-th largest = (n-k)-th smallest in 0-indexed terms.
function findKthLargest(nums, k) {
return quickselect(nums.slice(), nums.length - k);
}
findKthLargest([3, 2, 1, 5, 6, 4], 2); // 5 (2nd largest)
findKthLargest([3, 2, 3, 1, 2, 4, 5, 5, 6], 4); // 4 (4th largest)
The trick is the index translation: in a sorted array of length n, the k-th largest sits at index n - k. Quickselect with that target gives you the answer in O(n) expected time.
Famous problem: median of two sorted arrays
Quickselect is overkill if both arrays are already sorted (there's an O(log(min(m, n))) algorithm), but on a single unsorted array, finding the median is the canonical use case:
function median(arr) {
const a = arr.slice();
const n = a.length;
if (n & 1) {
return quickselect(a, n >> 1); // odd n: middle element
}
// Even n: average of n/2 - 1 and n/2 elements
const lo = quickselect(a, (n >> 1) - 1);
const hi = quickselect(a, n >> 1); // a is partitioned, so quickselect on right half
return (lo + hi) / 2;
}
Python implementation
import random
def quickselect(arr, k, lo=0, hi=None):
if hi is None:
hi = len(arr) - 1
while lo < hi:
# Random pivot — avoids worst case on sorted input
pivot_idx = random.randint(lo, hi)
arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
pivot = arr[hi]
# Lomuto partition
store = lo
for i in range(lo, hi):
if arr[i] < pivot:
arr[i], arr[store] = arr[store], arr[i]
store += 1
arr[store], arr[hi] = arr[hi], arr[store]
if store == k:
return arr[store]
elif store < k:
lo = store + 1
else:
hi = store - 1
return arr[lo]
# Production: just use the standard library
import heapq
heapq.nlargest(3, [3, 2, 1, 5, 6, 4]) # [6, 5, 4]
heapq.nsmallest(2, [3, 2, 1, 5, 6, 4]) # [1, 2]
# numpy.partition is the most direct equivalent of quickselect
import numpy as np
a = np.array([7, 2, 9, 4, 5, 1, 8, 6])
np.partition(a, 2)[2] # 3rd smallest = 4
Median-of-medians: O(n) worst case
Plain quickselect has O(n²) worst case when pivots are pathologically bad. Median-of-medians (Blum-Floyd-Pratt-Rivest-Tarjan, 1973) eliminates this by choosing a pivot that is provably "good enough":
def median_of_medians_select(arr, k):
if len(arr) <= 5:
return sorted(arr)[k]
# 1. Split into groups of 5
chunks = [arr[i:i+5] for i in range(0, len(arr), 5)]
# 2. Sort each group, take its median
medians = [sorted(c)[len(c) // 2] for c in chunks]
# 3. Recursively find the median of medians
pivot = median_of_medians_select(medians, len(medians) // 2)
# 4. Partition by that pivot
lows = [x for x in arr if x < pivot]
highs = [x for x in arr if x > pivot]
pivot_count = len(arr) - len(lows) - len(highs)
if k < len(lows):
return median_of_medians_select(lows, k)
elif k < len(lows) + pivot_count:
return pivot
else:
return median_of_medians_select(highs, k - len(lows) - pivot_count)
The key claim: the median-of-medians pivot is greater than at least 30% of the elements and less than at least 30%. So the recursive call shrinks the input to at most 70% of its size. The recurrence T(n) ≤ T(n/5) + T(7n/10) + O(n) solves to T(n) = O(n).
The constants are bigger than plain quickselect (~10-25× the work for the same n), so use it only when you need the worst-case guarantee. C++'s std::nth_element uses introselect: random-pivot quickselect with median-of-medians as fallback after too many bad pivots.
Common bugs and edge cases
- Off-by-one with k. "k-th smallest" usually means 1-indexed (k=1 → minimum), but the array index is 0-based. Pick a convention and be ruthless about it; mixing 0-indexed and 1-indexed in the same codebase produces silent bugs.
- Using last-element pivot on sorted input. Plain Lomuto partition with last-element pivot is O(n²) on sorted or reverse-sorted arrays — every pivot is the maximum or minimum of its subarray. Always randomize or use median-of-three.
- Forgetting to mutate-or-copy. Quickselect rearranges the array. If you'll need the original later, slice/copy first or your "input" will silently become the partial sort.
- Equal-element pile-up. If the array contains many duplicates and the pivot is one of them, simple two-way partition can degrade to O(n²). Use three-way (Dutch national flag) partitioning to handle duplicates in linear time.
- Recursive blowup on adversarial inputs. Plain quickselect on n=10⁵ with worst-case pivots can recurse 10⁵ deep — exceeds Python's recursion limit and JS's stack. Use the iterative form or median-of-medians.
- Treating quickselect output as sorted. The k-th element is in place, but the array is not sorted.
arr[k+1]is some element ≥arr[k], not the (k+1)-th smallest.
Frequently asked questions
Why is quickselect O(n) when quicksort is O(n log n)?
Quicksort recurses into both partitions; quickselect recurses into only the side that contains the kth element. With balanced partitions, the work series is n + n/2 + n/4 + ... = 2n. Quicksort processes both halves at each level, totaling n at every level for log n levels — hence n log n.
What is the median-of-medians algorithm?
A pivot-selection strategy that guarantees the chosen pivot lies in the middle 30-70% of the array. It partitions the array into groups of 5, finds each group's median, and recursively selects the median of those medians. This makes quickselect O(n) worst-case (not just expected) at the cost of higher constants.
Should I use a heap or quickselect for top-k?
Use a min-heap of size k for streaming data or k « n (e.g., top-10 from a billion-row stream). Time is O(n log k), space O(k). Use quickselect when you have the full array in memory and k is comparable to n — it runs in O(n) regardless of k, but rearranges the input in place.
What is the worst-case time of plain quickselect?
O(n²), achieved when every chosen pivot is the smallest or largest remaining element. With random or median-of-three pivots this is astronomically unlikely on real data, but adversarial inputs can trigger it. Median-of-medians eliminates the worst case at constant-factor cost.
Does quickselect modify the input array?
Yes. Like quicksort, it partitions in place. After it returns, the array is partially sorted: elements left of the kth are all ≤ it, elements right are all ≥ it, but neither side is sorted internally. If you need to preserve the input, copy it first.
How is quickselect used in real systems?
C++'s std::nth_element and Python's heapq.nlargest both use quickselect (introselect: quickselect with a heapsort fallback like introsort). Database query optimizers use it for percentile aggregations. NumPy's np.partition exposes it directly. The k-th order statistic is a primitive that shows up everywhere.