Quickselect
Quickselect is a Technique-Based topic: a selection algorithm that finds the k-th order statistic (k-th smallest or largest element) in an unsorted array. LeetCode problem count: ~50–100 problems. Depth level: core algorithms, main patterns, key complexities.
Core Concepts
Order statistic — the k-th smallest element in a collection when sorted; the 1st order statistic is the minimum, the n-th is the maximum. "k-th largest" maps to the (n − k + 1)-th smallest.
Selection problem — find the k-th order statistic without fully sorting; solvable in expected O(n) time vs. O(n log n) for sort.
Partition — rearranges an array subrange so all elements less than a pivot come before it and all elements greater come after it; returns the pivot's final index. The pivot is now in its sorted position permanently.
k-th order statistic (0-indexed) — after partitioning, if the pivot lands at index p: if p == k, return pivot; if p > k, recurse left; if p < k, recurse right. Only one branch is ever explored.
Expected O(n) time — with random pivot selection, expected work is n + n/2 + n/4 + … = 2n. Worst case O(n²) when pivots are always extreme (already-sorted input with first/last pivot).
Partial sort — returning the first k elements in sorted order; quickselect finds position k, then the left side is unsorted but all ≤ pivot; a second sort on k elements costs O(k log k). Total: O(n + k log k).
In-place selection — quickselect mutates the input array; the result is the value at index k after the algorithm finishes, not a separate output structure.
Introselect — hybrid: quickselect with random pivot until recursion depth exceeds a threshold, then falls back to median-of-medians to guarantee O(n) worst case. Used in C++ std::nth_element.
Types / Variants
| Variant | What it is | When to use | Key property |
|---|---|---|---|
| Lomuto partition | Pivot = last element; single forward scan with swap counter | Simple implementation; teaching / interviews | Unstable; O(n²) on sorted input with fixed pivot |
| Hoare partition | Two pointers moving inward; swaps crossing pairs | Slightly fewer swaps on average than Lomuto | Pivot ends up in the middle zone, not at exact final index |
| 3-way (DNF) partition | Splits into <, =, > regions using Dutch National Flag | Arrays with many duplicate values | Degrades to O(n) on all-equal inputs; avoids O(n²) duplicates trap |
| Randomised pivot | Random index selection before partitioning | Default choice for interviews and production | Expected O(n); defeats adversarial sorted inputs |
| Median-of-medians | Guaranteed O(n) worst case via recursive pivot selection | When worst-case guarantee is required (rare in interviews) | High constant factor; rarely faster in practice than randomised |
| Heap-based selection | Push all into a min/max heap or maintain a size-k heap | Online stream, repeated queries, or when input must not be mutated | O(n log k); simpler code; preferred when k << n and mutation is forbidden |
Key Algorithms
Lomuto Partition
Scans left to right; maintains a boundary where everything left of it is ≤ pivot (pivot = last element). Returns final pivot index.
Time: O(n) per call. Space: O(1).
Key insight: the scan variable j walks forward unconditionally; i advances only on a swap, tracking the boundary.
# pivot = arr[hi]; i tracks boundary of ≤ pivot region
if arr[j] <= pivot: i += 1; arr[i], arr[j] = arr[j], arr[i]
arr[i+1], arr[hi] = arr[hi], arr[i+1]; return i + 1
Hoare Partition
Two pointers start at opposite ends and move inward, swapping elements that are on the wrong side. Does not place pivot at its final index — returns a split point.
Time: O(n) per call. Space: O(1).
Key insight: the return value p means arr[lo..p] are all ≤ pivot and arr[p+1..hi] are all ≥ pivot, but p is not necessarily the pivot's sorted index. Recurse on [lo, p] and [p+1, hi], not [lo, p-1] and [p+1, hi].
3-Way Partition (Dutch National Flag)
Partitions into three contiguous regions: < pivot, == pivot, > pivot. Uses three pointers: lo (< boundary), mid (current), hi (> boundary).
Time: O(n) per call. Space: O(1).
Key insight: when all elements equal the pivot, the loop terminates in one pass with the entire array in the == region — O(n) total instead of O(n²).
lo, mid, hi = left, left, right
while mid <= hi:
if arr[mid] < pivot: arr[lo],arr[mid]=arr[mid],arr[lo]; lo+=1; mid+=1
elif arr[mid] > pivot: arr[mid],arr[hi]=arr[hi],arr[mid]; hi-=1
else: mid+=1
Quickselect (Core Loop)
Repeatedly partitions and recurses into only the subarray containing index k.
Time: expected O(n) with random pivot; O(n²) worst case. Space: O(log n) call stack expected, O(n) worst.
Key insight: unlike quicksort, only one branch is ever explored after each partition, giving the geometric series convergence to O(n).
if lo == hi: return arr[lo]
p = partition(arr, lo, hi) # random pivot swap before partitioning
if p == k: return arr[p]
elif p > k: return quickselect(arr, lo, p-1, k)
else: return quickselect(arr, p+1, hi, k)
Median-of-Medians
Guarantees O(n) worst-case by choosing a pivot that is at least in the 30th–70th percentile.
Time: O(n) worst case. Space: O(n) due to auxiliary group arrays (or O(log n) with in-place grouping).
Key insight: split into groups of 5, find each group's median by sorting (O(1) per group since size is constant), then recursively find the median of those medians. This pivot guarantees at least 3n/10 elements on each side.
The high constant factor (~10×) makes it slower than randomised quickselect in practice; use only when adversarial inputs or hard worst-case guarantees are required.
Heap-Based Selection
Build a max-heap of the first k elements, then scan the remaining n − k elements, replacing the heap root when a smaller element is found. Final heap root = k-th smallest.
Time: O(n log k). Space: O(k).
Key insight: maintaining a bounded heap is simpler, never mutates the input, and is O(n log k) — preferable when k is small (log k ≈ constant) or queries are online.
import heapq
heap = [-x for x in arr[:k]]
heapq.heapify(heap) # max-heap of size k via negation
for x in arr[k:]:
if x < -heap[0]: heapq.heapreplace(heap, -x)
return -heap[0]
See heap-priority-queue.md for full heap coverage.
nth_element (Library Shortcut)
Python does not have std::nth_element. Use sorted() or implement quickselect manually. In interviews, always implement quickselect from scratch unless told otherwise.
Advanced Techniques
Weighted k-th Smallest
Problem class: each element has a weight (frequency/count); find the element at cumulative weight position k.
Core mechanism: binary search on the value range or sorted candidates; for each candidate value v, count total weight of elements ≤ v in O(n); binary search over this monotone function.
When to prefer over quickselect: when the input is given as a frequency table or the "array" is implicitly defined (matrix, range), and explicit expansion is prohibitive.
Complexity: O(n log(max − min)) with binary search on value range.
Selection in a Row-and-Column Sorted Matrix
Problem class: find k-th smallest in an n × n matrix where rows and columns are sorted (LeetCode 378).
Core mechanism: binary search on value range [matrix[0][0], matrix[n-1][n-1]]; count elements ≤ mid using a staircase walk in O(n); narrow range until lo == hi.
When to prefer over quickselect: input is not a flat array; extracting to a flat array costs O(n²) space.
Complexity: O(n log(max − min)).
See binary-search.md for binary-search-on-answer pattern.
Introselect / Fallback Strategy
Problem class: production-grade selection where adversarial inputs exist.
Core mechanism: run randomised quickselect; track recursion depth; if depth exceeds 2 log n, switch to median-of-medians for the remaining subproblem.
When to prefer over pure quickselect: the input may be adversarial (e.g., competitive programming, security-sensitive contexts). In interviews, random pivot alone suffices.
Complexity: O(n) worst case.
Problem Taxonomy
By Problem Category
| Problem Type | What it asks | Key technique |
|---|---|---|
| k-th smallest/largest in unsorted array | Single selection query, mutation allowed | Quickselect with random pivot |
| k-th smallest/largest, no mutation | Same, but input must stay intact | Copy + quickselect, or heap |
| k closest points to origin | Find k elements minimising some metric | Quickselect on metric values (partial sort) |
| Top-k frequent elements | k elements with highest frequency | Build frequency map, then quickselect on frequencies |
| Median of unsorted array | 50th percentile statistic | Quickselect for index n//2 (and n//2−1 for even n) |
| k-th largest in a stream | Online / append-only input | Min-heap of size k; O(log k) per insert |
| k-th smallest in sorted matrix | Grid with row/column sorted order | Binary search on value range with count |
| Partial sort (return sorted top-k) | k smallest/largest in sorted order | Quickselect to position k, then sort left partition |
| k-th smallest pair sum / distance | Selection over implicit sorted list | Binary search on answer + counting function |
Problem Signal → Technique
| Signal in problem statement | Likely technique |
|---|---|
| "find k-th largest / smallest" | Quickselect or min-heap of size k |
| "return k closest / k nearest" | Quickselect on distance metric |
| "top k frequent" | Frequency map + quickselect or bucket sort |
| "median of unsorted" | Quickselect at index n//2 |
| "stream" or "online" with k | Heap of size k, not quickselect |
| "no extra space" | In-place quickselect |
| "k-th smallest in matrix" | Binary search on value + row staircase count |
| "without sorting" | Quickselect (expected O(n)) |
| "worst-case O(n)" | Median-of-medians or heap |
Common Patterns
| Pattern | When it applies | Mechanism |
|---|---|---|
| Random pivot swap before partition | All quickselect implementations | swap(arr, random(lo, hi), hi) before Lomuto; eliminates sorted-input worst case |
| k-th largest → k-th smallest mapping | Problem gives "k-th largest" | Use index n - k (0-indexed) for k-th smallest equivalent |
| Partial sort via quickselect | Need sorted top-k, not just the boundary | Quickselect to find position k, then sort arr[:k] |
| Heap of size k for streaming | Data arrives one element at a time | Maintain max-heap (k-th smallest) or min-heap (k-th largest) of exactly k elements |
| Binary search on answer with count | Implicit or non-flat input | lo, hi = min_val, max_val; count elements ≤ mid; shrink range |
| 3-way partition guard | Input has many duplicates | Replace standard 2-way partition; prevents O(n²) on all-equal subarrays |
| Iterative quickselect | Avoid recursion stack overflow | Replace tail recursion: update lo or hi based on pivot position, loop |
Edge Cases & Pitfalls
-
k out of range. If k < 1 or k > n, the algorithm accesses an invalid index or returns a wrong element. Always validate k before entering the selection loop.
-
All identical elements with 2-way partition. Lomuto partition places all equal elements on one side, causing O(n²) behaviour on uniform arrays. Fix: use 3-way (DNF) partition when duplicates are common, or at minimum check if all elements are equal before partitioning.
-
k-th largest vs k-th smallest index confusion. "k-th largest" in a 1-indexed problem maps to index
n - kin 0-indexed quickselect. Off-by-one here returns the wrong element entirely. Confirm the convention at the top of implementation. -
Hoare partition recurse bounds. After Hoare partition returns split point p, the left subproblem must include p: recurse on
[lo, p]not[lo, p-1]. Using p-1 can skip the pivot-side element and produce an infinite loop or wrong answer. -
Mutating caller's array. Quickselect rearranges the input in-place. If the caller expects the original order preserved, copy before calling. Forgetting this introduces subtle bugs in problems that reuse the array.
-
Random pivot with fixed seed. Using
random.randintwithout seeding is fine for most LeetCode; but in tests that check determinism, the result is correct but undebuggable. Keep the randomisation but add a fixed seed during debugging. -
Single-element subarray infinite recursion. When
lo == hi, return immediately. Without this base case, partition is called on a 1-element range and may loop or produce an out-of-bounds index. -
Median of even-length array. For even n, the median is typically the average of the two middle values (indices n//2 - 1 and n//2). Quickselect must be called twice (or once with 3-way partition that identifies the exact range). Returning only one index produces a wrong median.
-
Integer overflow when computing average for median.
(a + b) // 2can overflow in other languages; not an issue in Python (arbitrary precision integers), but the two-call quickselect approach avoids needing division at all.
Implementation Notes
- Python's
random.randint(lo, hi)is inclusive on both ends; use it to pick a random pivot index within[lo, hi]. - Python has no built-in
nth_element;heapq.nsmallest(k, arr)returns sorted top-k in O(n log k) and is acceptable when k is small, but not when O(n) is required. heapqis a min-heap only; simulate a max-heap by negating values.- Python's default recursion limit is 1000; for arrays of size > 500 with adversarial inputs, the call stack may overflow. Use iterative quickselect or
sys.setrecursionlimit. random.shufflebefore calling a non-randomised quickselect is equivalent to random pivot but shuffles the entire array — it's O(n) and acceptable for small n but changes the array order globally.- When using Lomuto partition, the pivot swap must happen before entering the scan loop, not after — placing it after the loop corrupts the scan.
Cross-Topic Relations
| Related topic | Connection |
|---|---|
| sorting.md | Quickselect derives from quicksort; same partition logic, but recurses into only one branch. Partial sort uses quickselect + sort on the smaller partition. |
| heap-priority-queue.md | Heap-based selection (size-k heap) is the main alternative; preferred for streaming, small k, or when mutation is forbidden. |
| divide-and-conquer.md | Quickselect is a divide-and-conquer algorithm; median-of-medians applies recursive sub-problem decomposition. |
| binary-search.md | Binary search on value range replaces quickselect when the input is non-flat (sorted matrix, frequency table). |
| arrays.md | All quickselect variants operate on arrays; in-place mutation and index arithmetic are array-specific concerns. |
| two-pointers.md | 3-way (DNF) partition uses a two-pointer / three-pointer technique internally. |
Interview Reference
Must-Know Problems
Core Selection
- Kth Largest Element in an Array (LC 215) — canonical quickselect problem; implement both quickselect and heap approaches
- Kth Largest Element in a Stream (LC 703) — heap of size k; online variant
- Find the Kth Smallest Element in a Sorted Matrix (LC 378) — binary search on value range
- K Closest Points to Origin (LC 973) — quickselect on squared distance
Top-K Variants
- Top K Frequent Elements (LC 347) — frequency map + quickselect or bucket sort
- Top K Frequent Words (LC 692) — same pattern with lexicographic tiebreak
- Sort Colors (LC 75) — 3-way DNF partition; foundational for understanding duplicate handling
Median / Percentile
- Find Median from Data Stream (LC 295) — two-heap approach; not quickselect but tests same intuition
- Wiggle Sort II (LC 324) — median via quickselect then 3-way partition for placement
Harder Applications
- K-th Smallest Prime Fraction (LC 786) — binary search on answer with counting
- Find K-th Smallest Pair Distance (LC 719) — binary search on answer + sliding window count
- K-th Smallest in Multiplication Table (LC 668) — binary search on value + row-wise count
Clarification Questions to Ask
- "Is the array 0-indexed or 1-indexed for k?" — determines the index used in quickselect.
- "Can the array be modified?" — determines whether to copy before running in-place quickselect.
- "Are there duplicate values?" — if yes, confirm the partition strategy handles them (3-way partition).
- "Is input given as a stream or all at once?" — stream → heap; batch → quickselect.
- "For 'median', what is the definition when n is even?" — average of two middles, lower middle, or upper middle.
- "What are the constraints on k?" — 1 ≤ k ≤ n is typical but must be confirmed.
Common Interview Mistakes
- Implementing quicksort instead of quickselect — recursing into both branches instead of one, giving O(n log n) instead of O(n).
- Using a fixed pivot (first or last element) without randomisation — interviewers will ask about sorted input performance; answer must address O(n²) worst case.
- Forgetting the base case
if lo == hi: return arr[lo]— causes infinite recursion or index-out-of-range on single-element subarrays. - Off-by-one in k-th largest to k-th smallest conversion — using
n - kvsn - k + 1depending on 0- vs 1-indexing of k; derive it explicitly rather than guessing. - Assuming 2-way partition is sufficient for all inputs — failing to mention 3-way partition when asked about duplicate-heavy inputs shows incomplete understanding.
- Claiming O(n) without qualifying "expected" — median-of-medians is O(n) worst case; standard quickselect is O(n) expected. Conflating these is a red flag.
- Not knowing the heap alternative — interviewers frequently ask "what if you cannot modify the input?" or "what if data arrives in a stream?"; missing the heap answer loses points.
Typical Follow-Up Escalations
- "Can you do it in O(n) worst case?" → Switch to median-of-medians; explain the group-of-5 pivot construction and the guaranteed 30/70 split.
- "What if the array arrives as a stream?" → Use a min-heap of size k; insert each new element and pop root if heap exceeds k; O(log k) per element.
- "What if there are many duplicates?" → Replace 2-way partition with 3-way (DNF) partition; the == region collapses all equal-to-pivot elements in one pass.
- "Can you avoid modifying the input?" → Either copy the array first (O(n) space) or use the heap approach (O(k) space).
- "Now return the k smallest in sorted order." → After quickselect places the boundary at index k, sort
arr[:k]; total O(n + k log k). - "What if the input is a matrix?" → Binary search on value range [min, max]; count elements ≤ mid using a staircase walk; O(n log(max − min)).
Read Next
Start reading to get personalized recommendations
Explore Unread
Great job! You've read all available articles