Sweep Line
Topic type: Technique-Based
LeetCode tag size: ~60–80 problems (interval, geometry, scheduling sub-tags combined)
Applies sections: Core Concepts, Structural Invariants (ordering constraint), Types / Variants, Key Algorithms, Advanced Techniques, Problem Taxonomy, Common Patterns, Edge Cases & Pitfalls, Implementation Notes, Cross-Topic Relations, Interview Reference
Core Concepts
Sweep line — an imaginary vertical (or horizontal) line that moves across the coordinate axis, processing discrete events in order; the algorithm maintains an active set of objects currently intersected by the line.
Event — a point on the axis where the set of active objects changes (an interval starts, ends, a rectangle edge appears, etc.). All algorithm logic executes at event points, not between them.
Event queue — the sorted list of all events; sorting order (and tie-breaking) is the single most important correctness decision in any sweep-line algorithm.
Active set — the data structure holding objects currently "open" (intersected by the line). The choice of active set—sorted list, heap, counter—determines both complexity and what queries are answerable at each event.
Open / close events — standard encoding: an interval [l, r] generates two events: (l, 0, id) (open) and (r, 1, id) (close), or (l, +1) / (r+1, -1) in difference-array form. The encoding drives tie-breaking.
Coordinate compression — when event coordinates are large but sparse, map them to dense indices 0…k−1 before building any array or segment tree over coordinates. Preserves order, reduces space.
Difference array sweep — instead of explicitly maintaining an active set, accumulate +1 at the start and −1 just after the end of each interval, then take a prefix sum. Answers "count of overlapping intervals at each point" in O(n + range) time.
Structural Invariants
| Invariant | What it guarantees | What breaks if violated |
|---|---|---|
| Events are processed in non-decreasing order of coordinate | Active set is always consistent with "line is at position x" | Processing out of order means objects may be added/removed at wrong positions, producing incorrect active sets |
| Tie-breaking is consistent with the query semantics | Whether a point exactly on an endpoint is counted as overlap-or-not matches the problem's definition | Off-by-one in overlap count (e.g., meeting rooms: person leaves at 10, new person arrives at 10 — same room or different?) |
| Every open event has a matching close event | Active set never leaks objects | Count drifts upward; correctness of "max simultaneous" or area queries is lost |
| Coordinate compression preserves strict order | Relative position of events is unchanged after mapping | Two distinct events may collapse to the same index, causing incorrect overlap or area computation |
Types / Variants
| Variant | What it is | When to use | Key difference |
|---|---|---|---|
| Difference array sweep | Encode intervals as ±1 deltas on an array; prefix-sum for point counts | Static intervals, offline, range fits in memory | No active set needed; O(n + range); cannot answer "which intervals overlap here" |
| Event-sort + counter | Sort open/close events; maintain a running integer count | Counting max simultaneous intervals (meeting rooms, max overlap) | Counter replaces full active set; O(n log n) |
| Event-sort + min-heap | Sort open events; heap holds close times; pop heap when line passes close time | Minimum resources needed (e.g., minimum meeting rooms with greedy assignment) | Heap gives the earliest-finishing active interval; O(n log n) |
| Event-sort + sorted active set | Sort events; maintain a sorted structure (BST/SortedList) of active objects | Line segment intersection, skyline, rectangle area union | Supports predecessor/successor queries on active set; O(n log n) |
| Coordinate-compressed + segment tree | Compress x-coordinates; segment tree answers range queries as line sweeps vertically | Rectangle union area, stabbing queries over large coordinate ranges | Handles non-unit-length segments; O(n log n) with lazy propagation |
Key Algorithms
Interval Overlap Detection (any two intervals overlap)
Sort intervals by start; if the start of the next interval is less than the end of the previous maximum-end seen so far, an overlap exists. O(n log n) time, O(1) space.
intervals.sort()
max_end = intervals[0][1]
for s, e in intervals[1:]:
if s < max_end: return True # overlap found
max_end = max(max_end, e)
Merge Overlapping Intervals
Sort by start; greedily extend the current merged interval when the next interval starts before the current end. O(n log n) time, O(n) space.
intervals.sort()
merged = [intervals[0]]
for s, e in intervals[1:]:
if s <= merged[-1][1]: merged[-1][1] = max(merged[-1][1], e)
else: merged.append([s, e])
Maximum Simultaneous Overlap (Meeting Rooms II — min rooms needed)
Create two events per interval: +1 at start, −1 at end. Sort all events; process in order tracking a running count; the maximum count reached is the answer. Tie-break: process ends before starts if same coordinate means the room is freed before re-used. O(n log n) time, O(n) space.
events = [(s, 1) for s,e in intervals] + [(e, -1) for s,e in intervals]
events.sort(key=lambda x: (x[0], x[1])) # -1 before +1 at same point
Minimum Meeting Rooms (Greedy Heap Variant)
Sort meetings by start time. Use a min-heap of end times representing occupied rooms. For each meeting, if the earliest-ending room finishes before this meeting starts, reuse it (pop); otherwise allocate a new room (push). Heap size at end = rooms needed. O(n log n) time, O(n) space.
import heapq
intervals.sort()
heap = []
for s, e in intervals:
if heap and heap[0] <= s: heapq.heapreplace(heap, e)
else: heapq.heappush(heap, e)
Skyline Problem
Treat each building [l, r, h] as two events: (l, -h) (enter, negative for sort priority) and (r, +h) (leave). Sweep left to right; maintain a max-heap of active heights; emit a key point whenever the current maximum height changes. O(n log n) time, O(n) space. The "lazy removal" approach uses a max-heap and only removes heights when they become the current maximum during a query.
Key insight: entering events use negative height so that at the same x-coordinate, entering a tall building is processed before leaving a shorter one, preventing spurious key points.
from sortedcontainers import SortedList
# active heights tracked; when max changes, record (x, new_max)
Rectangle Union Area (Coordinate Compression + Segment Tree)
Compress all x-coordinates. Sweep vertically through horizontal edges (bottom = add, top = remove). A segment tree over compressed x-intervals tracks the total "covered length" at each sweep step; multiply by vertical gap to accumulate area. O(n log n) time, O(n) space.
The segment tree node stores: count (how many rectangles fully cover this segment) and covered (total covered length in this segment's x-range). This is a non-standard lazy segment tree where count > 0 means fully covered regardless of children.
Difference Array Sweep (Static Point Queries)
For each interval [l, r], do diff[l] += 1 and diff[r+1] -= 1. Take prefix sum to get the count of intervals covering each point. O(n + range) time, O(range) space. Use coordinate compression to reduce range to O(n).
diff = [0] * (max_coord + 2)
for l, r in intervals: diff[l] += 1; diff[r+1] -= 1
counts = list(itertools.accumulate(diff))
Line Segment Intersection (Shamos-Hoey — Existence Check)
Sort endpoints; maintain a sorted active set of segments ordered by their y-value at the current sweep line x. At each left endpoint, insert the segment and check neighbors for intersection. At each right endpoint, check the new neighbors of the segment being removed. O(n log n) time, O(n) space.
Key insight: if an intersection exists anywhere, it must be detected between two segments that become neighbors in the active set before their crossing point.
Advanced Techniques
Lazy Deletion from Active Set (Heap or Sorted Structure)
What it solves: Removing a specific interval from a heap is O(n); the lazy approach avoids this by leaving stale entries and skipping them when they surface.
Mechanism: Mark removed intervals in a "to-delete" set. When processing the top of the heap, pop-and-discard until the top is not marked for deletion.
When to use over simpler alternatives: Use when intervals finish non-chronologically relative to the sweep (e.g., intervals overlap by ID, not just by end time) and you need a heap but can't afford O(n) deletion. For problems where intervals finish in sorted order, a simple heap without lazy deletion suffices.
Complexity: O(n log n) amortized; each interval is pushed and popped at most once.
Coordinate Compression Before Sweep
What it solves: Sweep over large sparse coordinate ranges (up to 10^9) without O(range) arrays.
Mechanism: Collect all event x-values (and x±1 if half-open intervals need boundary distinction), sort and deduplicate into a list xs. Map each coordinate to its index in xs. Build diff arrays or segment trees over indices 0…len(xs)−1.
When to use over simpler alternatives: Use when coordinates exceed 10^5–10^6 and a difference array would be too large. For small ranges (≤ 10^5), direct indexing is simpler and faster in practice.
Complexity: O(n log n) for compression; downstream structures inherit their own complexity unchanged.
Segment Tree with Cover Count (Rectangle Area / Interval Cover Queries)
What it solves: "What is the total length of the x-axis currently covered by at least one active interval?" — needed for rectangle union area.
Mechanism: Each node stores cnt (number of times this exact segment is fully covered) and covered (total covered length in its range). When cnt > 0, covered = segment_length regardless of children. When cnt == 0, covered = left.covered + right.covered. No lazy propagation to children; cnt is a local count updated by range add/subtract operations.
When to use over simpler alternatives: Use only when the "covered length" must be queried dynamically as intervals are added/removed. For static queries, a prefix sum on a sorted event list suffices.
Complexity: O(log n) per update/query after O(n) build; O(n log n) overall for n rectangles.
Problem Taxonomy
By Problem Category
| Problem type | What it asks | Key technique |
|---|---|---|
| Interval merge | Combine all overlapping intervals into non-overlapping ones | Sort by start + greedy extend |
| Max simultaneous overlap | How many intervals overlap at the busiest point | Event sort + running counter |
| Resource allocation | Minimum resources to handle all intervals without conflict | Sort + min-heap of end times |
| Interval scheduling | Maximum non-overlapping intervals to keep | Greedy by earliest end time (not sweep line; see greedy) |
| Skyline / silhouette | Outer contour of overlapping rectangles | Event sort + max-heap or sorted active set |
| Rectangle union area | Total area covered by union of axis-aligned rectangles | Coordinate compression + segment tree sweep |
| Point stabbing | How many intervals contain a query point | Difference array or offline sort |
| Calendar / booking | Can a new event be booked without k-fold overlap | Difference array + binary search, or segment tree |
| Line segment intersection | Do any two segments intersect? | Shamos-Hoey sweep |
| Falling intervals / coverage gaps | Detect uncovered points in a range | Event sort + gap detection |
Problem Signal → Technique
| Signal in problem | Likely technique |
|---|---|
| "minimum number of rooms / machines / workers" | Sort + min-heap of end times |
| "maximum number of overlapping intervals at any point" | Event sort + counter (two-event encoding) |
| "merge all overlapping intervals" | Sort by start + greedy merge |
| "total area covered / union of rectangles" | Coordinate compression + segment tree sweep |
| "remove minimum intervals to make non-overlapping" | Greedy by earliest end (interval scheduling maximization) |
| "find all intersection points" / "do segments intersect" | Sweep line + sorted active set |
| "how many events are active at time t" | Difference array or event sort |
| "k-th free / occupied slot" | Binary search on prefix-sum difference array |
| "book event, no triple booking" | Difference array with threshold check |
| coordinates up to 10^9 with interval queries | Coordinate compression before sweep |
Common Patterns
| Pattern | When it applies | Mechanism |
|---|---|---|
| Two-event encoding | Any interval problem needing a running count | Emit (pos, +1) and (pos, -1); sort; running sum |
| Sort-once, sweep-once | Problems where a single left-to-right pass answers the query | Sort events O(n log n), then O(n) pass with O(1) or O(log n) active-set ops |
| Heap as active set | When you only need the minimum or maximum of active intervals (not arbitrary lookup) | Min-heap of end times; pop when line passes the top |
| SortedList as active set | When you need predecessor/successor of active intervals (skyline, segment intersection) | from sortedcontainers import SortedList; insert/remove O(log n) |
| Difference array + prefix sum | Static offline queries: "count at point x" | One pass to fill diff, one prefix sum pass, one lookup |
| Coordinate compression wrapper | Large sparse coordinates | Map coords → dense indices; all downstream structures use indices |
| Tie-break by event type | Endpoints of different intervals share the same coordinate | Sort key = (coord, event_type_int) where type_int encodes priority |
Edge Cases & Pitfalls
-
Tie-breaking at shared endpoints: Two intervals sharing an endpoint (e.g., [1,3] and [3,5]) may or may not overlap depending on the problem definition (closed vs. half-open intervals). The fix: encode the event key as
(coord, type)wheretypeis 0 for "leave" and 1 for "enter" (so leaves process before enters at the same point, making intervals non-overlapping). Reverse the order if the problem treats touching intervals as overlapping. -
Off-by-one in difference array: Using
diff[r] -= 1vs.diff[r+1] -= 1depends on whether intervals are inclusive or exclusive atr. Using the wrong boundary shifts the count for the endpoint itself, causing incorrect results on boundary queries. -
Forgetting to clear the active set after processing: In problems with multiple test cases or sweeps, a leftover active set from a previous sweep produces phantom active objects. Always reinitialise between sweeps.
-
Heap "stale entry" bug: Deleting an interval by marking it and forgetting to drain stale entries before reading the heap top gives incorrect minimum/maximum. Always drain stale entries before trusting the heap top.
-
Coordinate compression collapsing adjacent events: If you compress only exact event coordinates without also including
coord+1orcoord-1for boundary-exclusive events, two logically distinct positions (e.g., "just before x=5" and "at x=5") may map to the same compressed index. The fix: for half-open intervals [l, r), include both l and r in the compression set; for closed intervals [l, r], include l, r, and r+1. -
Rectangle area double-counting: In the segment tree sweep, failing to subtract the segment length when a rectangle's top edge is processed (close event) before the next bottom edge (open event) can double-count horizontal strips. Ensure close events are processed before open events at the same y-coordinate.
-
Integer overflow in area computation: Rectangle coordinates up to 10^9 mean area can reach 10^18, exceeding 32-bit integers. Python handles arbitrary precision natively, but in Java/C++ this requires
long. -
Unsorted active set giving wrong predecessor/successor: The skyline algorithm requires the active set to be sorted by height at all times. Using an unsorted list and scanning for maximum height degrades to O(n^2) and is a common interview mistake.
Implementation Notes
- Python's
heapqis a min-heap only. For a max-heap of heights (skyline), push negative heights:heapq.heappush(heap, -h). from sortedcontainers import SortedListis available in LeetCode's Python environment and provides O(log n) add/remove/bisect. It is the practical choice for any active set requiring order queries.- Python's
bisectmodule (bisect_left,bisect_right) implements coordinate compression lookups in O(log n) aftersorted(set(coords)). - Recursion depth for segment trees over n ≤ 2×10^5 events stays well under Python's default 1000-limit for iterative implementations; recursive segment trees on larger inputs may require
sys.setrecursionlimit. - When building a difference array over compressed coordinates, the compressed array has length
len(xs), notmax_coord + 2; index arithmetic must use the compressed index, not the original coordinate. itertools.accumulatecomputes prefix sums in one line with no explicit loop; prefer it over manual accumulation for clarity.
Cross-Topic Relations
| Related topic | Connection |
|---|---|
| Intervals | Sweep line is the primary algorithmic technique for almost all interval problems; the two topics are tightly coupled |
| Sorting | Every sweep line algorithm begins with a sort; correctness depends entirely on sort key and tie-breaking |
| Heap (Priority Queue) | Used as the active set when only the min/max of active objects is needed |
| Segment Tree | Enables O(log n) range cover queries during the sweep (rectangle union, stabbing counts) |
| Greedy | Interval scheduling maximization (remove fewest intervals) is a greedy problem that shares event-sort setup with sweep line but does not maintain an active set |
| Coordinate Compression | Prerequisite technique when coordinates are large and sparse; always applied before building arrays or trees over coordinate space |
| Difference Array | A degenerate sweep line with no active set; handles static offline interval-count queries without sorted containers |
See intervals.md for interval-specific patterns (insert interval, interval list intersections) that do not require a full sweep.
Interview Reference
Must-Know Problems
Interval Merge / Overlap (Easy–Medium)
- Merge Intervals (LC 56)
- Insert Interval (LC 57)
- Non-overlapping Intervals (LC 435)
- Interval List Intersections (LC 986)
Meeting Rooms / Resource Counting (Medium)
- Meeting Rooms (LC 252) — can one person attend all?
- Meeting Rooms II (LC 253) — minimum rooms needed
- Car Pooling (LC 1094) — capacity sweep
- My Calendar I / II / III (LC 729, 731, 732) — booking with k-fold overlap constraint
Skyline / Geometry (Hard)
- The Skyline Problem (LC 218)
- Rectangle Area II / Union (LC 850)
- Falling Squares (LC 699)
Advanced Sweep (Hard)
- Count of Rectangles Containing Each Point (LC 2250)
- Number of Flowers in Full Bloom (LC 2251)
- Maximum Number of Events That Can Be Attended II (LC 1751)
Clarification Questions to Ask
- Are interval endpoints inclusive or exclusive? (Determines tie-breaking in event sort.)
- Can intervals be empty (start == end)?
- Are there duplicate intervals?
- Is the problem online (events arrive one at a time) or offline (all intervals given upfront)? Online problems cannot use difference arrays.
- What should be returned when there are zero intervals?
- For geometry problems: can rectangle edges coincide? Are coordinates integers or floats?
Common Interview Mistakes
- Sorting by start time only and forgetting to sort by end time as a tiebreaker when starts are equal, causing incorrect merge results.
- Using
diff[r] -= 1(exclusive right) when the problem specifies closed intervals[l, r], shifting the coverage count for the right endpoint. - Building the heap variant of meeting rooms but sorting by end time instead of start time, which produces incorrect greedy assignments.
- Forgetting that
heapqin Python is a min-heap: pushing heights directly and calling it a max-heap of heights produces wrong skyline results. - Not handling the final "flush" of the active set in skyline: after all events are processed, the ground-level (height 0) key point may not be emitted.
- Using a plain list +
max()as the active set in the skyline problem, producing an O(n^2) solution that times out. - Applying a difference array to an online booking problem where intervals are added dynamically and queries must return immediately; difference arrays require a rebuild pass and cannot answer online queries efficiently.
Typical Follow-Up Escalations
- "Now the events arrive in a stream (online)." → Replace difference array with a dynamic segment tree or a sorted list with binary search; difference arrays are offline-only.
- "Now find the exact time ranges where k or more events overlap (not just the count)." → Track transitions in the counter and emit ranges when count crosses k.
- "Now intervals have weights; find the maximum weight of simultaneously active intervals." → Replace counter with a max-heap keyed by weight and end time; similar sweep structure.
- "Coordinates can be up to 10^9." → Apply coordinate compression before building any array-based structure; expected answer: O(n log n) with compressed arrays.
- "Can you do it in O(n) space?" → For overlap counting: difference array (offline) or a counter-only sweep; for skyline: cannot avoid O(n) active set.
Read Next
Start reading to get personalized recommendations
Explore Unread
Great job! You've read all available articles