Ordered Set
Core Concepts
Ordered Set — a collection that maintains elements in sorted order and supports O(log n) insert, delete, and lookup, plus floor/ceiling/rank/select queries not available in a hash set.
SortedList — Python's sortedcontainers.SortedList; a sorted array of arrays providing O(log n) add/discard and O(log n) rank/select via internal bisect. The practical ordered-set implementation in Python interviews.
Multiset — an ordered set that allows duplicate elements; Python's SortedList behaves as a multiset by default (add does not deduplicate).
Ordered Map / Sorted Dict — an ordered set where each key carries a value; Python's SortedDict from sortedcontainers. Provides the same floor/ceiling/rank operations on keys while storing associated data.
Floor query — given k, find the largest element ≤ k currently in the set. Undefined when the set is empty or all elements exceed k.
Ceiling query — given k, find the smallest element ≥ k currently in the set. Undefined when the set is empty or all elements are less than k.
Predecessor — the largest element strictly less than k; equivalent to floor(k−1) for integers.
Successor — the smallest element strictly greater than k; equivalent to ceiling(k+1) for integers.
Rank query — given a value v, return how many elements in the set are strictly less than v (0-indexed position). Also called bisect_left index.
Select query — given an index i, return the element at position i in sorted order (0-indexed). The inverse of rank.
Order-Statistics Tree — a balanced BST augmented with subtree sizes at each node, enabling O(log n) rank and select. Not directly available in Python stdlib; simulated via SortedList or a Fenwick tree over compressed coordinates.
Coordinate Compression — mapping arbitrary values to dense integer indices [0, n−1] before storing in a Fenwick tree, enabling the tree to answer rank/select queries over large value domains in O(log n) time and O(n) space.
Balanced BST — the underlying structure of ordered sets in most languages (AVL, Red-Black). Python's sortedcontainers.SortedList uses a B-tree-like array-of-arrays for cache efficiency rather than pointer-based nodes.
Structural Invariants
| Property | What it guarantees | What breaks if violated |
|---|---|---|
| Sorted order maintained after every insert/delete | Floor, ceiling, rank, select answers are correct | Wrong query answers; binary search produces incorrect indices |
| No duplicates (set semantics) | rank(v) == select(rank(v)) round-trips correctly | Rank and select become inconsistent |
| Multiset: all copies tracked | Count queries and sliding-window counts are accurate | Under-counting; premature "empty" on remove |
| Rank = bisect_left index | Rank 0 means v is less than all elements | Off-by-one in range counts and kth-element retrieval |
Internal Representation
SortedList (sortedcontainers) — stores elements in a list of sorted sublists, each of size ~1000. All operations use bisect internally. Access by index is O(log n) (not O(1)) because it must locate the sublist first.
Bisect over plain list — using Python's bisect module on a regular list. Insert is O(n) due to shifting; suitable only when n ≤ 10³ or inserts are rare (read-heavy workloads).
Fenwick Tree (BIT) over compressed coordinates — compress all possible values to [0, m−1], then use a BIT where index i represents value compressed[i]. update(i, +1) = insert, update(i, −1) = delete, prefix_sum(i) = rank. Select requires binary search on the BIT. O(log m) per operation, O(m) space.
Segment Tree over compressed coordinates — same idea as BIT but supports range min/max queries in addition to rank/count. Use when floor/ceiling across a dynamic value range is needed alongside counts.
Pointer-based balanced BST — not available in Python stdlib. C++ std::set/std::multiset and Java TreeSet use Red-Black trees. In Python, simulate with SortedList or implement a treap/AVL only if the problem explicitly requires a custom augmentation not achievable otherwise.
Types / Variants
| Variant | What it is | When to use | Key property |
|---|---|---|---|
| Ordered Set | Unique sorted elements | Deduplication + sorted order + range queries | No duplicates; standard floor/ceiling API |
| Ordered Multiset | Sorted elements with duplicates | Sliding-window medians, frequency tracking | SortedList default behavior; count by bisect_right − bisect_left |
| Ordered Map / Sorted Dict | Key-value store sorted by key | Associating data with sorted keys (e.g., intervals by start) | Floor/ceiling on keys; O(log n) key access |
| Order-Statistics Tree | BST augmented with subtree sizes | Rank/select queries required simultaneously with insert/delete | O(log n) rank + select; not in stdlib |
| BIT-based ordered set | Fenwick tree over value space | Offline problems with known value universe; count inversions | O(log m) ops; cannot retrieve actual element values directly |
Topic-Specific Operations
Insert
Add an element to the set while maintaining sorted order. In SortedList: sl.add(v) — O(log n). For a set (no duplicates): check v not in sl first or use SortedList with manual deduplication via bisect_left.
Delete
Remove one occurrence of an element. sl.remove(v) raises ValueError if not present; sl.discard(v) is safe. For multisets, remove removes exactly one copy. O(log n).
Floor (largest ≤ k)
i = sl.bisect_right(k) - 1
return sl[i] if i >= 0 else None
bisect_right(k) gives the insertion point after all copies of k; subtract 1 to get the last element ≤ k.
Ceiling (smallest ≥ k)
i = sl.bisect_left(k)
return sl[i] if i < len(sl) else None
bisect_left(k) is the index of the first element ≥ k.
Predecessor (largest < k)
i = sl.bisect_left(k) - 1
return sl[i] if i >= 0 else None
Successor (smallest > k)
i = sl.bisect_right(k)
return sl[i] if i < len(sl) else None
Rank (count of elements < v)
sl.bisect_left(v) returns exactly this count. O(log n). For count of elements ≤ v, use sl.bisect_right(v).
Select (element at position i)
sl[i] — O(log n) in SortedList (not O(1)). Negative indices work: sl[-1] is the maximum, sl[0] is the minimum.
Range Count (elements in [lo, hi])
sl.bisect_right(hi) - sl.bisect_left(lo)
O(log n). Returns the count of elements x where lo ≤ x ≤ hi.
Kth Smallest
sl[k] directly (0-indexed). For kth smallest ≥ 1-indexed, use sl[k-1].
Key Algorithms
Sliding Window with Ordered Set
Maintain an ordered multiset over a window of size k; answer queries (median, min, max, kth element) for each window position. Add the incoming element, remove the outgoing element, then query. O(n log n) total.
For median specifically: maintain two SortedList halves (lower half max-indexed, upper half min-indexed) and rebalance after each add/remove to keep sizes equal (or differing by 1).
Two-Heap Median vs. Ordered Set Median
Two heaps achieve O(log n) per operation but cannot answer floor/ceiling or range queries. Use SortedList when you need more than the median — it generalises at the same asymptotic cost.
Count Inversions via Ordered Set / BIT
Process elements right-to-left; for each element v, the count of elements already inserted that are less than v equals rank(v). Accumulate these counts. O(n log n).
See segment-tree.md and prefix-sum.md for BIT-based inversion counting details.
Nearest Smaller / Larger Element (Dynamic)
Maintain an ordered set of currently active elements. For a query value v, predecessor(v) gives the nearest smaller and successor(v) gives the nearest larger in the active set. Unlike a monotonic stack, this supports arbitrary insertions and deletions. O(log n) per operation.
Interval Overlap Detection (Sweep)
Store interval start points in a SortedDict keyed by start time. For a new interval [s, e], check if the predecessor's end overlaps s, and if the successor's start is within e. O(log n) per interval.
Dynamic Rank Query (Online)
For an online stream of insertions and rank queries, use SortedList. Insert v → O(log n); rank(v) = sl.bisect_left(v) → O(log n). For offline with known value set, a BIT over compressed coordinates is faster in practice (smaller constants).
Kth Smallest in Matrix / Multiple Sorted Arrays
Use a min-heap to merge and advance, or binary search on value with ordered-set rank as feasibility check. The ordered set answers "how many values ≤ mid?" in the structure in O(log n), making binary search viable.
See heap-priority-queue.md for the merge-k-sorted-lists pattern.
Advanced Techniques
Order-Statistics Tree via BIT + Coordinate Compression
Solves problems requiring simultaneous dynamic insert/delete and rank/select on arbitrary values, without needing a custom BST.
Mechanism: compress all values (known offline) to indices [0, m−1]; maintain a BIT over these indices. update(compress[v], +1) for insert, −1 for delete. Rank = prefix_sum(compress[v] − 1). Select = binary search for the index where prefix sum first reaches k.
When to use over SortedList: when m (the value universe size) is small but n (operations) is large and you need guaranteed O(log m) with minimal constant. Also when the problem involves counting, not retrieving actual values. If you need actual values or the universe is unknown, prefer SortedList.
Complexity: O(log m) per operation, O(m) space.
Two-SortedList Median Maintenance
Maintains a running median over a dynamic multiset by splitting into lower half and upper half.
Mechanism: keep lo (max is lo[-1]) and hi (min is hi[0]). On insert: add to lo if v ≤ lo[-1] else to hi; rebalance so |len(lo) − len(hi)| ≤ 1 by moving lo[-1] or hi[0]. Median = lo[-1] (odd total) or average of lo[-1] and hi[0] (even).
When to use over two-heap: when you also need floor/ceiling/rank queries on the same data, or when you need to delete arbitrary elements (heap deletion is O(n) without augmentation; SortedList.remove is O(log n)).
Complexity: O(log n) per insert/delete/median query.
Offline Range Frequency with Sorted Indices
For each value v, store sorted list of its occurrence indices. To answer "how many times does v appear in [l, r]?", binary search in indices[v] for bisect_right(r) − bisect_left(l).
When to use over segment tree: fully offline, queries are value-specific (not range-of-values), and the value universe is sparse. Saves building a segment tree over a large range.
Complexity: O(n log n) preprocessing, O(log n) per query.
Problem Taxonomy
By Problem Category
| Category | What it asks | Key technique |
|---|---|---|
| Dynamic median | Maintain median as elements arrive/depart | Two-SortedList split or two-heap |
| Kth smallest (dynamic) | After each insertion, return kth element | SortedList[k] directly |
| Count elements in range | How many current elements satisfy lo ≤ x ≤ hi | bisect_right(hi) − bisect_left(lo) on SortedList |
| Nearest value query | Find floor/ceiling/predecessor/successor dynamically | SortedList bisect operations |
| Count inversions | Pairs (i, j) with i < j and a[i] > a[j] | BIT rank or SortedList rank from right |
| Range frequency (offline) | Count occurrences of v in index range [l, r] | Sorted index lists + bisect |
| Sliding window order statistics | Min, max, median, kth in a moving window | SortedList add/remove + index |
| Interval scheduling / overlap | Insert interval, detect or count overlaps | SortedDict by start; predecessor/successor check |
| Smallest available number | Track gaps in a used-number set | Ordered set of free numbers; pop minimum |
Problem Signal → Technique
| Signal in problem | Likely technique |
|---|---|
| "Find the k-th smallest after each insertion" | SortedList, index directly |
| "Sliding window of size k, find median" | Two-SortedList halves or two-heap |
| "Count pairs (i < j) where a[i] > a[j]" | BIT rank or SortedList rank right-to-left |
| "Find smallest number ≥ x not yet used" | Ordered set of available numbers, ceiling query |
| "How many elements are in range [l, r] at any point" | SortedList range count |
| "Insert / delete / find predecessor or successor" | SortedList bisect operations |
| "Online stream, answer rank or select queries" | SortedList (online) or BIT+compress (offline) |
| "Minimum difference between any two elements" | SortedList; check sl[i+1] − sl[i] after each insert |
| "Avoid scheduling conflicts / detect overlapping intervals" | SortedDict by interval start; floor/ceiling check |
Common Patterns
| Pattern | When it applies | Mechanism |
|---|---|---|
| Add-query-remove per window | Sliding window order statistics | sl.add(incoming); query; sl.remove(outgoing) |
| Bisect-right − 1 for floor | Floor query | Index into SortedList with bisect_right(k) − 1 |
| Bisect-left for ceiling | Ceiling query | Index into SortedList with bisect_left(k) |
| Split into lo / hi halves | Median maintenance | Two SortedLists; rebalance after each mutation |
| Right-to-left rank accumulation | Inversion count | For each element, rank in current set = inversions it forms |
| Sorted index lists per value | Offline range frequency | Group occurrence indices by value; binary search per query |
| Ordered set of free slots | "Smallest unused" problems | Initialise set with all valid values; pop minimum (ceiling of 0 or 1) |
| Floor/ceiling on interval endpoints | Overlap detection | Predecessor endpoint overlaps new interval start; successor start within new interval end |
Edge Cases & Pitfalls
-
Querying floor/ceiling on an empty set —
bisect_right(k) − 1returns−1when the set is empty; always guard withif len(sl) == 0orif i >= 0before indexing. Failing to do so causes anIndexError. -
Removing an element not present in a multiset —
sl.remove(v)raisesValueError. Usesl.discard(v)for safety, or verify existence withv in sl(O(log n)) before removal. In sliding-window problems, ensure the window boundary element matches what was added. -
Off-by-one in floor vs. predecessor — floor returns the largest element ≤ k (includes k itself); predecessor returns the largest element strictly < k. Confusing the two leads to incorrect answers when k is present in the set. Use
bisect_rightfor floor (includes k),bisect_left − 1for predecessor (excludes k). -
Off-by-one in rank —
bisect_left(v)= count of elements strictly < v (0-indexed rank).bisect_right(v)= count of elements ≤ v. Wrong choice flips the answer for range count and inversion count. -
SortedListindex access is O(log n), not O(1) — do not callsl[i]inside a loop expecting O(1); the loop becomes O(n log n) per element. For repeated indexed access, convert to a list first if the set is static. -
Integer deduplication when multiset semantics are needed — if the problem has duplicate values and you use a plain set or a SortedList with manual deduplication, you lose count information. Use
SortedListdirectly (it is a multiset) or maintain aCounteralongside. -
BIT coordinate compression with unknown universe — BIT-based ordered sets require all values to be known upfront for compression. If queries arrive online with new values, BIT is not applicable without dynamic extension; use
SortedListinstead. -
Two-half median rebalancing direction — after inserting into the wrong half, rebalance before reading the median. Reading before rebalance gives a stale or off-by-one median value.
-
Minimum across the whole set vs. minimum of a range —
sl[0]is the global minimum; for the minimum element ≥ lo, usesl[sl.bisect_left(lo)](ceiling). Conflating the two is a common error. -
inoperator on SortedList —v in slis O(log n) (binary search), not O(1) like a hash set. This is fine for occasional membership checks but avoid it inside tight inner loops.
Implementation Notes
sortedcontainers.SortedListis not in the Python standard library; on LeetCode it is pre-installed and importable asfrom sortedcontainers import SortedList. Confirm availability in contest environments before relying on it.SortedList.addnever deduplicates — it is always a multiset insert. To enforce set semantics (no duplicates), checksl.count(v) == 0or usebisect_left/bisect_rightequality before adding.SortedList.__contains__(v in sl) runs in O(log n);SortedList.count(v)runs in O(log n + count) and should not be used in a tight loop for large repeated values.- Python's
bisectmodule operates on plain lists with O(n) insert/delete; use it only for offline-built static sorted arrays or when n ≤ 2 × 10³. SortedList.irange(lo, hi)returns an iterator over elements in [lo, hi] without materialising a list; prefer it for iteration, but count queries are faster withbisect_right − bisect_left.- For BIT-based ordered sets, use 1-indexed BIT to avoid the off-by-one on prefix sums; the standard BIT template assumes 1-indexing.
- Python's recursion limit (default 1000) is irrelevant for
SortedList-based solutions (iterative internally), but matters if you implement a custom BST recursively — usesys.setrecursionlimitor convert to iterative. SortedDict(fromsortedcontainers) provideskeys()as aSortedKeysView; floor/ceiling on keys requiresbisect_left/bisect_rightonsd.keys()then indexingsd.peekitem(i).
Cross-Topic Relations
| Related topic | Specific connection |
|---|---|
| Binary Search | Floor/ceiling operations are bisect operations; bisect_left/bisect_right are the primitive operations underlying all ordered-set queries. See binary-search.md. |
| Heap / Priority Queue | Heaps give O(1) min/max and O(log n) push/pop but cannot answer floor/ceiling/rank/select; ordered set is the generalisation when those queries are needed. See heap-priority-queue.md. |
| Segment Tree | A segment tree over a value range answers the same rank/count queries as an ordered set with O(log m) per operation; use when range aggregate queries (sum, min, max) are also needed. See segment-tree.md. |
| Prefix Sum / BIT | A BIT over compressed coordinates implements dynamic rank/select — the core of an order-statistics tree. See prefix-sum.md. |
| Sliding Window | Ordered set is the data structure that enables order-statistic queries (median, kth) within a sliding window; the window logic is the technique, the ordered set is the structure. See sliding-window.md. |
| Data Stream | Many data-stream problems (running median, kth largest) require a dynamic ordered structure; ordered set or two-heap are the canonical solutions. See data-stream.md. |
| Sorting | Offline problems can sort input and use bisect on a static array instead of a dynamic ordered set — always prefer this when no dynamic insertions/deletions occur. See sorting.md. |
| Binary Tree / BST | The logical model behind an ordered set is a balanced BST. See binary-tree.md for tree structure fundamentals. |
Interview Reference
Must-Know Problems
Floor / Ceiling / Predecessor / Successor
- LeetCode 220 — Contains Duplicate III (window-bounded floor/ceiling check)
- LeetCode 352 — Data Stream as Disjoint Intervals (merge intervals using sorted structure)
- LeetCode 855 — Exam Room (find seat maximising distance using predecessor/successor)
Dynamic Kth Element / Rank
- LeetCode 295 — Find Median from Data Stream (two-heap or two-SortedList)
- LeetCode 480 — Sliding Window Median (ordered multiset + sliding window)
- LeetCode 703 — Kth Largest Element in a Stream (heap; ordered set generalises this)
- LeetCode 1825 — Finding MK Average (three-SortedList split for complex windowed stat)
Count / Range Queries
- LeetCode 315 — Count of Smaller Numbers After Self (BIT rank or SortedList rank)
- LeetCode 327 — Count of Range Sum (BIT or merge sort; ordered set rank approach)
- LeetCode 493 — Reverse Pairs (rank query variant)
- LeetCode 2250 — Count Number of Rectangles Containing Each Point (offline + bisect)
Interval / Scheduling
- LeetCode 715 — Range Module (dynamic interval set)
- LeetCode 732 — My Calendar III (sweep line with SortedDict)
- LeetCode 1353 — Maximum Number of Events That Can Be Attended (greedy + ordered set of end times)
Smallest Available / Gap
- LeetCode 1887 — Reduction Operations to Make the Array Elements Equal (ordered set of distinct values)
- LeetCode 2336 — Smallest Number in Infinite Set (ordered set of reclaimed numbers + threshold pointer)
Clarification Questions to Ask
- Are values guaranteed to be integers, or can they be floats? (Affects BIT coordinate compression feasibility.)
- Can there be duplicate values? (Determines whether set or multiset semantics are needed.)
- Is the value universe bounded and known upfront, or can arbitrary values arrive online? (Determines BIT vs. SortedList.)
- Are queries online (each answer depends on previous) or offline (all queries known upfront)? (Offline allows sorting + static bisect, which is simpler.)
- What is n? (n ≤ 10³ allows O(n²); n ≤ 10⁵ requires O(n log n); n ≤ 10⁶ requires O(n log n) with small constants.)
- Is deletion required, or only insertion? (Insertion-only allows simpler structures.)
- Is the problem asking for the kth smallest overall, or within a range/window?
Common Interview Mistakes
- Using a sorted list (plain Python list +
bisect.insort) for dynamic insert/delete —insortis O(log n) for the binary search but O(n) for the list shift, making the solution O(n²) overall. - Forgetting that
SortedListis a multiset: callingsl.remove(v)when v was added multiple times removes only one copy; the others remain. When the intent is "remove all copies", iterate or use a separate count. - Implementing floor as
bisect_left(k) − 1— this is predecessor (largest < k), not floor (largest ≤ k); usebisect_right(k) − 1for floor. - Calling
sl[sl.bisect_left(k)]without bounds-checking when k is larger than all elements —bisect_leftreturnslen(sl), causingIndexError. - For median maintenance with two halves, forgetting to rebalance before reading the median — reading
lo[-1]whenlohas 0 elements causes an error. - Choosing BIT+compression for an online problem where values are not known upfront — this silently produces wrong answers when an unseen value arrives.
- Confusing rank (position of v, 0-indexed count of elements < v) with select (element at position i) — implementing both as the same call inverts the query semantics.
Typical Follow-Up Escalations
- "Now also support range-sum queries" → augment with a segment tree or BIT that stores sums, not just counts; ordered set alone cannot answer this.
- "Now do it in O(1) amortised per query" → not generally achievable for dynamic ordered sets; clarify the constraint — often the interviewer means O(1) amortised for a restricted operation (e.g., pop minimum from a pre-sorted list).
- "Now handle deletions as well as insertions" → confirm that
SortedList.remove/discardhandles this; if using BIT, decrement the index count by 1. - "What if n is 10^7?" → BIT over compressed coordinates with O(log m) is preferred over SortedList due to smaller constants; or use a merge-sort offline approach if all operations are known upfront.
- "Can you do it without
sortedcontainers?" → use BIT + coordinate compression for rank/select, or implement a treap/skip list. In practice, describe the approach and note thatsortedcontainersis the idiomatic Python answer. - "Now return the count of elements in every window, not just one window" → offline sweep: sort queries by window end, use a BIT to answer range-count at each step.
Read Next
Start reading to get personalized recommendations
Explore Unread
Great job! You've read all available articles