Strongly Connected Component (SCC)
Core Concepts
Strongly Connected Component (SCC) — a maximal set of vertices in a directed graph such that every vertex is reachable from every other vertex within the set. "Maximal" means you cannot add another vertex and preserve the property.
Weakly Connected Component — the same set of vertices would be connected if all edges were made undirected. Distinct from SCC; a graph can be weakly connected but have many SCCs.
Condensation DAG — the directed acyclic graph formed by contracting each SCC to a single node and keeping inter-SCC edges. Every directed graph has a unique condensation DAG. It is always acyclic because any cycle in the condensation would merge the involved SCCs.
Transitive closure — vertex u can reach vertex v iff there is a path u → v in the original graph. Within an SCC all pairs are mutually reachable. Between SCCs, reachability is determined by the condensation DAG.
Discovery time / finish time — DFS timestamps. In Kosaraju's, finish order on the original graph determines the processing order on the reversed graph. Finish time is the clock tick when DFS fully returns from a node.
Low-link value — used in Tarjan's algorithm. For a vertex v, low[v] is the minimum discovery time reachable from the subtree rooted at v using at most one back/cross edge. If low[v] == disc[v], v is the root of an SCC.
Stack invariant (Tarjan's) — a vertex remains on the stack while it is potentially part of an SCC being explored. A vertex is popped with its entire SCC only when low[v] == disc[v] is confirmed.
Source SCC / Sink SCC — in the condensation DAG, a source SCC has in-degree 0 (no incoming inter-SCC edges) and a sink SCC has out-degree 0. The number of edges needed to make a DAG strongly connected equals max(sources, sinks) (except for the trivial single-node case).
Types / Variants
| Algorithm | Passes | Core mechanism | Space overhead | Practical preference |
|---|---|---|---|---|
| Kosaraju's | 2 DFS passes | First DFS on G records finish order; second DFS on G^T in reverse finish order | Reverse graph G^T must be built explicitly; O(V+E) extra | Easier to implement correctly; preferred when clarity matters |
| Tarjan's | 1 DFS pass | Tracks disc/low arrays + explicit stack; pops SCC when low[v]==disc[v] | No reverse graph; O(V) stack + arrays | Preferred in competitive programming; single pass, cache-friendly |
| Path-based (Pearce's) | 1 DFS pass | Uses two stacks: one for candidates, one to track SCC boundaries | Similar to Tarjan's; slightly different stack logic | Rare in interviews; Tarjan's is more widely known |
Key Algorithms
Kosaraju's Algorithm
Finds all SCCs in O(V + E) time, O(V + E) space by exploiting the property that the SCCs of G and G^T (reversed graph) are identical sets, but reachability between them is reversed.
Key insight: The vertex with the largest finish time in G belongs to a source SCC in the condensation. Running DFS from it on G^T visits exactly that SCC (because inter-SCC edges are reversed, blocking leakage into other SCCs).
Step 1 — DFS on G, push each vertex onto a stack upon finishing:
def dfs1(u, visited, stack, adj):
visited.add(u)
for v in adj[u]:
if v not in visited: dfs1(v, visited, stack, adj)
stack.append(u)
Step 2 — DFS on G^T in reverse-finish order (pop from stack), each DFS tree is one SCC. Time O(V+E), space O(V+E) for G^T.
Tarjan's Algorithm
Finds all SCCs in a single DFS pass in O(V + E) time, O(V) space (excluding adjacency list).
Key insight: When DFS fully returns to a vertex v and low[v] == disc[v], v is the root of an SCC — everything on the stack above v belongs to that SCC.
Low-link update rules:
- Tree edge to child w (after DFS returns):
low[v] = min(low[v], low[w]) - Back edge to ancestor w already on stack:
low[v] = min(low[v], disc[w]) - Cross edge to w NOT on stack (already in a completed SCC): do not update low[v]
# Core update inside DFS after visiting neighbor w:
if w not in disc: # tree edge
dfs(w); low[v] = min(low[v], low[w])
elif w in on_stack: # back/cross to same SCC candidate
low[v] = min(low[v], disc[w])
The on_stack set (separate from the visited set) is critical — without it, cross-edges to already-completed SCCs would incorrectly lower low-link values.
SCC extraction — after the loop over neighbors, if low[v] == disc[v], pop the stack until v is popped; all popped vertices form one SCC.
Condensation DAG Construction
After finding SCCs (by either algorithm), assign each vertex a component ID. Build the condensation by scanning all edges (u, v): if comp[u] != comp[v], add edge comp[u] → comp[v] to the condensation. Use a set to deduplicate edges. O(V + E) time.
# After SCC labeling:
dag_edges = set()
for u in range(n):
for v in adj[u]:
if comp[u] != comp[v]: dag_edges.add((comp[u], comp[v]))
The resulting DAG supports topological-order DP for reachability and min-cost path queries across SCCs.
Advanced Techniques
2-SAT (Two-Variable Boolean Satisfiability)
Problem class: Given a CNF formula where every clause has exactly 2 literals, determine if a satisfying assignment exists and find one.
Core mechanism: Model each variable x as two nodes (x and ¬x). Each clause (a ∨ b) becomes two implication edges: ¬a → b and ¬b → a. Run SCC on the implication graph. The formula is satisfiable iff no variable x and its negation ¬x are in the same SCC. If satisfiable, assign truth values using condensation DAG topological order: x = true iff x appears in a later SCC than ¬x in topological order.
When to use over simpler alternatives: Only when the problem has boolean constraints that can be expressed as implications between pairs of literals. General SAT is NP-hard; 2-SAT is linear precisely because of the implication structure that SCCs capture.
Complexity: O(V + E) where V = 2n variables, E = 2m clauses.
See graph.md for implication graph construction details.
Minimum Edges to Make Graph Strongly Connected
Problem class: Given a directed graph (possibly a DAG or partially connected), find the minimum number of directed edges to add so the entire graph becomes strongly connected.
Core mechanism: Condense to DAG. Count source SCCs (in-degree 0 in DAG) = s, and sink SCCs (out-degree 0 in DAG) = t. Answer = max(s, t). Special case: if the original graph is already one SCC, answer = 0. For a single-node graph, answer = 0.
When to use: Only after condensation. Do not try to compute this on the raw graph — it is only tractable on the condensation DAG.
Complexity: O(V + E) for condensation + O(V) to compute in/out-degrees.
Reachability Queries via Condensation + Bitset DP
Problem class: Given a directed graph, answer multiple "can u reach v?" queries offline.
Core mechanism: After condensation (k SCCs), run topological DP on the condensation DAG where each node stores a bitset of all reachable SCC IDs. Merge child bitsets into parents. Each query reduces to a bitset lookup.
When to use over BFS/DFS per query: Only when k is small (≤ a few thousand) and queries are numerous. BFS/DFS per query is O(V+E); bitset DP preprocessing is O(k²/64 * k) but answers each query in O(1).
Complexity: Preprocessing O(k² / 64) with bitset optimizations; query O(1).
Problem Taxonomy
By Problem Category
| Problem category | What it asks | Key technique |
|---|---|---|
| Count SCCs | How many strongly connected components | Kosaraju's or Tarjan's directly |
| Detect strong connectivity | Is the whole graph one SCC? | Run SCC, check count == 1 |
| Minimum edges to strongly connect | Fewest edges to add | Condense → count sources and sinks |
| Boolean satisfiability | 2-literal clauses satisfiable? | 2-SAT via implication graph SCC |
| Critical connections | Edges/nodes whose removal disconnects SCCs | Tarjan's bridges/articulation points on condensation |
| Reachability in cycles | Can u always reach v even with cycles? | Condense then DAG reachability |
| Cycle detection + grouping | Find all cycles and which vertices share them | SCC (each non-trivial SCC is a cycle group) |
| DAG reduction | Condense redundant cycle structure before DP | Condensation then DP on DAG |
Problem Signal → Technique
| Signal in problem statement | Likely technique |
|---|---|
| "strongly connected", "every node reachable from every other" | Direct SCC (Kosaraju's or Tarjan's) |
| "minimum edges to make strongly connected" | Condensation → max(sources, sinks) |
| "at most one of x, y can be true" / boolean implications | 2-SAT |
| "if A then B" constraints over boolean variables | 2-SAT implication graph |
| Directed graph + "can always reach" + cycles present | Condense first, then reachability on DAG |
| "critical road" / "bridge" in directed graph context | Tarjan's with bridge detection |
| Directed graph + topological sort fails (has cycle) | Condense to DAG, then topological sort on condensation |
| "groups of nodes that form cycles" | SCC as cycle-group identification |
Common Patterns
| Pattern | When it applies | Mechanism |
|---|---|---|
| Condense-then-solve | Problem involves reachability or ordering on a directed graph with cycles | Run SCC → build condensation DAG → solve on DAG (DP, topo-sort, source/sink counting) |
| Reverse-graph finish order | Using Kosaraju's | Build adjacency list for G^T while reading input; first DFS push order gives processing order for second pass |
| On-stack tracking in Tarjan's | Implementing Tarjan's correctly | Maintain a boolean on_stack[] array in addition to disc[]/low[]; only use disc[w] to update low if w is currently on the stack |
| Implication graph construction | 2-SAT setup | For n variables, use 2n nodes: node i for x_i true, node i+n for x_i false; clause encoding is always two directed edges |
| Iterative DFS for large graphs | Avoiding Python recursion limit | Use an explicit stack with (node, iterator) tuples; finish-time push happens when the iterator is exhausted |
| Component ID assignment | After Tarjan's | SCCs are popped in reverse topological order of the condensation — component IDs assigned in pop order are naturally in reverse topological order |
Edge Cases & Pitfalls
-
Single-node graph: A single vertex with no self-loop is its own SCC of size 1. Algorithms handle this correctly, but "minimum edges to strongly connect" must return 0 for a single node — verify the special case before applying the
max(sources, sinks)formula. -
Self-loops: A self-loop
u → udoes not make u part of a non-trivial SCC with other nodes. In Tarjan's, the back edgeu → uupdateslow[u] = min(low[u], disc[u])which is a no-op (already equal), so correctness is preserved; no special-casing needed. -
Disconnected graph: Both Kosaraju's and Tarjan's must be called from every unvisited vertex (outer loop over all vertices), not just from vertex 0. Missing this produces incomplete SCC decomposition.
-
Cross-edge vs back-edge confusion in Tarjan's: Updating
low[v]withdisc[w]for a cross-edge to a completed SCC (w not on stack) is wrong — it pulls a vertex into an already-closed SCC. Theon_stackcheck is non-negotiable. -
Using
low[w]instead ofdisc[w]for back edges: When w is an ancestor on the stack (back edge), updatelow[v] = min(low[v], disc[w]), notlow[w]. Usinglow[w]is the "wrong" Tarjan variant that still computes SCCs correctly but conflates it with bridge-finding logic; mixing them causes bugs when adapting for bridges. -
Kosaraju's reverse-graph construction: Must reverse edge direction, not just process the adjacency list backwards. If using adjacency list
radj, the edgeu → vin G becomesv → uin radj. A common mistake is building radj asradj[u].append(v)(same direction). -
2-SAT: same SCC for x and ¬x means UNSAT: Check
comp[i] == comp[i + n]for all i before extracting an assignment. Extracting an assignment when UNSAT gives a meaningless result. -
Condensation DAG self-loops: When building the condensation, an edge
u → vwherecomp[u] == comp[v]is an intra-SCC edge and must be filtered out (use the set deduplication pattern). Leaving them in creates self-loops that break topological sort. -
Recursion depth in Python: Graphs with V up to 10^5 will hit Python's default recursion limit (1000) with recursive DFS. Always use iterative DFS or set
sys.setrecursionlimitbefore the interview.
Implementation Notes
- Python's
sys.setrecursionlimitmust be set explicitly for graph DFS; the default 1000 is too low for contest-sized inputs. An iterative implementation is safer and often required. collections.defaultdict(list)is idiomatic for adjacency lists in Python; avoids KeyError on isolated vertices.- In Tarjan's, the
on_stackmembership check should use a separatesetor boolean array — usingstack.count(w) > 0is O(V) per check and causes TLE. - When assigning component IDs in Tarjan's, SCCs are emitted in reverse topological order of the condensation; if you need topological order, either reverse the IDs or use
num_sccs - 1 - idas the topological index. - For 2-SAT with n variables, allocate nodes 0..n-1 for the positive literals and n..2n-1 for the negative literals; the negation of literal i is
i ^ n(XOR) only when using this exact layout. - Python
heapqis not needed for SCC algorithms; no priority queue is involved. All operations are DFS-based and run in O(V+E).
Cross-Topic Relations
| Related topic | Connection |
|---|---|
| Graph (DFS/BFS) | SCC algorithms are DFS-based; understanding DFS timestamps (discovery/finish) is a prerequisite |
| Topological Sort | Condensation DAG is a prerequisite for topological sort on graphs with cycles; SCCs reduce cyclic graphs to DAGs |
| Bridges & Articulation Points | Both use DFS low-link values; Tarjan's bridge algorithm shares the disc/low framework but applies to undirected graphs and does not use an on-stack set |
| 2-SAT | Directly reduces to SCC on an implication graph; SCC is the core subroutine |
| Dynamic Programming on DAGs | Condensation produces the DAG on which DP (reachability, longest path, min cost) is applied |
| Union-Find (DSU) | Handles undirected connectivity; SCC handles directed strong connectivity — distinct problems, same "grouping" intuition |
| Biconnected Components | Analogous structure for undirected graphs; shares low-link insight but differs in definition and extraction |
See graph.md for DFS fundamentals, topological sort, and bridge detection.
Interview Reference
Must-Know Problems
Direct SCC / Connectivity
- Number of Provinces (LC 547) — undirected connected components (warm-up; use for directed analog)
- Course Schedule II (LC 210) — topological sort; precursor to condensation logic
- Largest Color Value in a Directed Graph (LC 1857) — SCC condensation + DP on DAG
- Find Eventual States (LC 457) — identify sink SCCs in a directed graph
Minimum Edges / Restructuring
- Reorder Routes to Make All Paths Lead to the City Zero (LC 1466) — directed reachability restructuring
- Minimum Number of Days to Disconnect Island (LC 1568) — structural connectivity (analogous reasoning)
2-SAT
- (No LeetCode problem is explicitly tagged 2-SAT, but appears in hard graph problems involving boolean constraints and implications between pairs)
Cycle Detection + SCC Grouping
- Course Schedule (LC 207) — directed cycle detection
- Redundant Connection II (LC 685) — directed graph; finding the edge creating a cycle that breaks tree structure
Hard / Combined
- Critical Connections in a Network (LC 1192) — Tarjan's bridges (shares low-link with SCC)
- Remove Max Number of Edges to Keep Graph Fully Traversable (LC 1579) — connectivity with constraints
Clarification Questions to Ask
- "Is the graph directed or undirected?" — SCC only applies to directed graphs; undirected uses connected components.
- "Can there be self-loops or multiple edges between the same pair of vertices?" — affects SCC count and condensation deduplication.
- "What is the size of the graph (V and E)?" — determines whether recursive DFS is feasible or iterative is required.
- "Are vertex labels 0-indexed or 1-indexed?" — common source of off-by-one bugs in component ID assignment.
- "For 2-SAT: what is the exact form of each constraint?" — must confirm each constraint is expressible as a disjunction of exactly 2 literals.
Common Interview Mistakes
- Implementing Tarjan's without the
on_stackset, causing cross-edges to completed SCCs to incorrectly lower low-link values and merge distinct SCCs. - Using recursive DFS in Python on graphs with thousands of nodes without adjusting the recursion limit, causing a
RecursionErrormid-execution. - In Kosaraju's, building the reversed graph with the same edge direction as the original (reversing the iteration order but not the edge direction).
- Applying the
max(sources, sinks)formula without checking the special case where the graph is already strongly connected (answer should be 0, not max(0,0) = 0, but forgetting leads to off-by-one when the condensation has exactly 1 node). - In 2-SAT, forgetting to check
comp[i] == comp[i+n]before extracting an assignment, producing a bogus "satisfying" assignment for an UNSAT instance. - Building the condensation with self-loops (intra-SCC edges) and then attempting topological sort, which fails or produces incorrect order.
Typical Follow-Up Escalations
- "Now find the minimum number of edges to add to make the graph strongly connected." — Condense to DAG, count source SCCs (in-degree 0) and sink SCCs (out-degree 0), answer is
max(sources, sinks). - "Now answer reachability queries online after the graph is built." — Condense, then run DFS/BFS on the condensation DAG per query, or precompute reachability with bitset DP if the condensation is small.
- "The graph has up to 10^5 nodes — can your recursive solution handle it?" — Switch to iterative DFS with an explicit stack storing
(node, neighbor_iterator)tuples. - "What if the graph is undirected?" — SCCs don't apply; use Union-Find or BFS/DFS connected components instead. Bridges and articulation points use a related low-link approach.
- "Extend 2-SAT to 3-SAT." — 3-SAT is NP-hard; the implication-graph reduction does not generalize. Explain why: 3-SAT clauses cannot be decomposed into a polynomial-time graph structure.
Read Next
Start reading to get personalized recommendations
Explore Unread
Great job! You've read all available articles