Graph Theory
Core Concepts
Graph — a set of vertices (nodes) V and edges E connecting pairs of vertices; the fundamental model for pairwise relationships.
Directed graph (digraph) — edges have orientation; edge (u, v) ≠ (v, u). Undirected graphs have bidirectional edges implicitly.
Weighted graph — each edge carries a numeric cost; unweighted graphs treat all edges as cost 1.
Degree — number of edges incident to a vertex. Directed graphs split into in-degree (incoming) and out-degree (outgoing); sum(in-degrees) = sum(out-degrees) = |E|.
Path — sequence of vertices where consecutive pairs share an edge; length = number of edges traversed.
Cycle — a path where the first and last vertices are the same. A graph with no cycles is acyclic.
Connected component — maximal subset of vertices mutually reachable in an undirected graph. A directed graph has strongly connected components (SCCs) where every vertex can reach every other within the component.
DAG (Directed Acyclic Graph) — directed graph with no cycles. Models dependency relationships; admits topological ordering.
Bipartite graph — vertices can be 2-coloured such that every edge crosses between colours; contains no odd-length cycles.
Spanning tree — a subgraph that connects all n vertices using exactly n−1 edges with no cycles.
Articulation point — a vertex whose removal disconnects the graph. Bridge — an edge whose removal disconnects the graph.
Eulerian path — a path visiting every edge exactly once. Exists in undirected graphs when exactly 0 or 2 vertices have odd degree. Eulerian circuit — Eulerian path that returns to start; requires all vertices to have even degree.
Topological order — linear ordering of DAG vertices such that for every edge (u→v), u appears before v.
Clique — subset of vertices all mutually adjacent. Independent set — subset with no two adjacent vertices.
Structural Invariants
| Property | Invariant | Consequence if violated |
|---|---|---|
| DAG | No directed cycle exists | Topological sort undefined; DFS enters infinite loop |
| Bipartite | Every edge crosses between the two colour classes | Odd-length cycle exists; 2-colouring fails |
| Tree | Connected + exactly n−1 edges (no cycles) | Either disconnected or contains a cycle |
| Undirected adjacency | Edge (u,v) stored in both adj[u] and adj[v] | BFS/DFS misses paths in one direction |
| Eulerian circuit | All vertex degrees are even | Circuit cannot return to start |
| SCC | Within each SCC every vertex reaches every other | Condensation DAG has erroneous edges |
Internal Representation
Adjacency list — adj[u] stores list of neighbours (or (neighbour, weight) tuples). O(V+E) space. Preferred for sparse graphs. Iteration over neighbours is O(degree(u)).
Adjacency matrix — mat[u][v] = edge weight (or 0/∞ if absent). O(V²) space. O(1) edge existence check. Preferred when V is small (≤ 1000) and the graph is dense, or when Floyd-Warshall is needed.
Edge list — flat list of (u, v, w) triples. O(E) space. Used when sorting edges by weight (Kruskal) or iterating all edges (Bellman-Ford).
Implicit graph (grid) — the grid itself is the graph; cells are vertices, adjacency defined by 4- or 8-directional neighbours. No explicit adjacency structure needed.
Comparison:
| Representation | Space | Edge check | Neighbour iteration | Best for |
|---|---|---|---|---|
| Adjacency list | O(V+E) | O(degree) | O(degree) | Most problems |
| Adjacency matrix | O(V²) | O(1) | O(V) | Dense graphs, Floyd-Warshall |
| Edge list | O(E) | O(E) | N/A | Kruskal, Bellman-Ford |
Types / Variants
| Variant | Key property | When to use |
|---|---|---|
| Undirected | Edge (u,v) = (v,u) | Connectivity, MST, bridges |
| Directed (digraph) | Edges have direction | Dependency ordering, SCCs, flow |
| Weighted | Edges have costs | Shortest path, MST |
| DAG | Directed + acyclic | Topological sort, DP on graph |
| Bipartite | 2-colourable | Matching, assignment problems |
| Multigraph | Multiple edges between same pair | Euler path variants |
| Implicit/State-space | Vertices are states, edges are transitions | Word Ladder, sliding puzzle |
Traversals & Access Patterns
BFS and DFS are the two fundamental traversal strategies; each has its own file for full coverage.
See breadth-first-search.md for BFS and all its patterns (level-order, multi-source, 0-1 BFS, state-space BFS). See depth-first-search.md for DFS and all its patterns (cycle detection, timestamps, back/forward/cross edges).
Multi-source BFS — seed the queue with all sources simultaneously at distance 0; distances computed are to the nearest source. Used for "distance from nearest X" problems (rotting oranges, walls and gates).
DFS with timestamps — assign entry[u] and exit[u] during DFS. A vertex v is an ancestor of u iff entry[v] < entry[u] and exit[v] > exit[u]. Drives Tarjan's bridge/SCC detection.
Iterative DFS — use an explicit stack instead of recursion to avoid Python's recursion limit (default 1000).
Key Algorithms
Dijkstra's Algorithm
Single-source shortest path on graphs with non-negative edge weights. Time: O((V+E) log V) with a binary heap.
Key insight: greedy — the unvisited vertex with the smallest tentative distance is always finalised correctly because no shorter path can arrive via a negative edge.
heapq.heappush(heap, (dist, node))
while heap:
d, u = heapq.heappop(heap)
if d > dist[u]: continue # stale entry — skip
Fails when any edge weight is negative; use Bellman-Ford instead.
Bellman-Ford
Single-source shortest path on graphs that may have negative edge weights. Detects negative cycles. Time: O(VE).
Key insight: after k relaxations of all edges, shortest paths using at most k edges are correct. Run V−1 rounds; a V-th round with any update indicates a negative cycle.
for _ in range(V - 1):
for u, v, w in edges:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
Floyd-Warshall
All-pairs shortest paths. Time: O(V³), space: O(V²). Works with negative weights; detects negative cycles when dist[i][i] < 0.
Key insight: dist[i][j] through intermediate vertex k is min(dist[i][j], dist[i][k] + dist[k][j]); iterate k in the outer loop.
Use when V ≤ 500 and you need distances between all pairs.
Kruskal's MST
Minimum spanning tree via sorted edges + Union-Find. Time: O(E log E).
Key insight: greedily add the cheapest edge that connects two different components. Correctness follows from the cut property.
edges.sort(key=lambda e: e[2])
for u, v, w in edges:
if find(u) != find(v):
union(u, v); mst_cost += w
Prim's MST
MST via greedy vertex expansion. Time: O(E log V) with a binary heap.
Key insight: maintain a min-heap of edges crossing the frontier; always expand by the cheapest crossing edge. Equivalent result to Kruskal; better when E is close to V².
Topological Sort — Kahn's Algorithm (BFS)
Linear ordering of a DAG. Time: O(V+E).
Key insight: repeatedly remove vertices with in-degree 0. If the final ordering has fewer than V vertices, a cycle exists.
queue = deque(v for v in range(V) if indegree[v] == 0)
while queue:
u = queue.popleft(); order.append(u)
for v in adj[u]:
indegree[v] -= 1
if indegree[v] == 0: queue.append(v)
Topological Sort — DFS-based
Post-order DFS; prepend each vertex to result upon completion. Cycle detected when revisiting a vertex in the current DFS path (grey node).
See depth-first-search.md for the grey/white/black colouring.
Union-Find (Disjoint Set Union)
Tracks connectivity in undirected graphs. find with path compression: O(α(V)) amortised. union by rank.
See hash-table.md for Union-Find implementation notes. Full treatment deserves its own file — cross-reference when available.
def find(x):
if parent[x] != x: parent[x] = find(parent[x])
return parent[x]
def union(x, y):
px, py = find(x), find(y)
if rank[px] < rank[py]: px, py = py, px
parent[py] = px
if rank[px] == rank[py]: rank[px] += 1
Tarjan's SCC
Finds all strongly connected components in a directed graph in a single DFS pass. Time: O(V+E).
Key insight: maintain a stack of visited nodes and low[u] = minimum discovery time reachable from u's subtree via back edges. When low[u] == disc[u], u is the root of an SCC — pop the stack until u is reached.
# low[u] = min(disc[u], min(disc[w]) for back edges to w)
if low[u] == disc[u]: # u is SCC root
scc = []
while stack[-1] != u: scc.append(stack.pop())
scc.append(stack.pop())
Kosaraju's SCC
Alternative SCC algorithm using two DFS passes. Time: O(V+E).
- Run DFS on original graph; push vertices to stack in finish-time order.
- Transpose the graph (reverse all edges).
- Run DFS on transposed graph in stack order — each DFS tree is one SCC.
Tarjan's is preferred (single pass, no transposition) unless code simplicity matters more.
Tarjan's Bridges & Articulation Points
Finds bridges and articulation points in a single DFS pass. Time: O(V+E).
Key insight: edge (u, v) is a bridge if low[v] > disc[u] — no back edge from v's subtree reaches u or higher. Vertex u is an articulation point if it is root with ≥2 DFS children, or non-root with a child v where low[v] >= disc[u].
Bipartite Check
BFS or DFS 2-colouring. Assign alternating colours; if any edge connects same-colour vertices, the graph is not bipartite. Time: O(V+E).
Cycle Detection — Undirected
DFS: a cycle exists if we reach an already-visited vertex that is not the direct parent. Union-Find: a cycle exists when find(u) == find(v) before adding edge (u, v).
Cycle Detection — Directed
DFS with three states: unvisited (white), in-current-path (grey), done (black). A back edge to a grey vertex indicates a cycle.
See depth-first-search.md for the full 3-colour DFS.
Hierholzer's Algorithm (Euler Path/Circuit)
Finds an Eulerian path or circuit in O(V+E).
Key insight: DFS; when a vertex has no remaining edges, append it to the result (in reverse). Works because dead ends are always at the "end" of the path.
def dfs(u):
while adj[u]:
dfs(adj[u].pop())
path.append(u)
# reverse path at end
Advanced Techniques
0-1 BFS
Shortest path when edges have cost 0 or 1 only. Time: O(V+E) — faster than Dijkstra.
Mechanism: use a deque; push cost-0 neighbours to the front, cost-1 neighbours to the back. Equivalent to Dijkstra but without a heap.
When to prefer over Dijkstra: edge weights are exclusively 0 or 1 (or a small integer range that allows bucketing). Dijkstra is overkill and 0-1 BFS runs in linear time.
Bidirectional BFS
Shortest path in unweighted graphs from source to target. Runs two simultaneous BFS frontiers, one from each end, meeting in the middle.
Mechanism: alternate expanding the smaller frontier; terminate when a vertex appears in both visited sets.
When to prefer over standard BFS: the branching factor is high and the path is long — bidirectional reduces the search space from O(b^d) to O(b^(d/2)).
Virtual Node
Reduces multi-source / multi-target shortest-path problems to single-source by introducing a dummy node connected to all sources (or all targets) with cost 0.
Mechanism: add virtual node S; add 0-weight edges S→each source; run Dijkstra from S.
When to prefer over multi-source BFS: the graph is weighted. Multi-source BFS only works for uniform-weight graphs.
State-Space BFS / Dijkstra
Model each unique (position, extra-state) tuple as a node in an implicit graph. Common extra states: number of moves used, keys held (bitmask), fuel remaining.
Mechanism: define visited as (node, state) pairs; expand normally with BFS or Dijkstra.
When to use: the problem asks for shortest path with a constraint that changes along the path (e.g., "at most k stops", "collect all keys"). Using node alone as visited state is incorrect — same node with different states must be explored separately.
State space can be O(V × 2^keys) — verify it fits in memory before applying.
Graph DP on DAGs
Dynamic programming where the DP recurrence is defined by the DAG's edge structure. Process vertices in topological order; DP transitions follow edges.
Mechanism: dp[v] = f(dp[u] for u in predecessors(v)). Equivalent to memoised DFS.
When to prefer over plain DP: the state transitions are naturally sparse (not all previous states are relevant — only predecessors in the DAG). Saves O(V) work per state versus a dense 1D DP sweep.
Condensation (SCC + DAG reduction)
Collapse each SCC into a single node, producing a DAG of SCCs. Enables topological reasoning about strongly connected subgraphs.
Mechanism: find SCCs (Tarjan/Kosaraju), assign each vertex an SCC ID, add edges between SCC IDs for inter-SCC edges.
When to use: problems asking about reachability, influence, or ordering at the level of "groups that are mutually reachable" (e.g., "minimum vertices to reach all nodes").
Problem Taxonomy
By Problem Category
| Problem type | What it asks | Key technique |
|---|---|---|
| Single-source shortest path | Min cost from one source to all/one target | Dijkstra (non-negative), Bellman-Ford (negative), BFS (unweighted) |
| All-pairs shortest path | Min cost between every pair | Floyd-Warshall (V ≤ 500) |
| Connectivity / reachability | Are u and v connected? | Union-Find or BFS/DFS |
| Connected component count | How many components? | DFS/BFS iterate over all unvisited; Union-Find |
| Topological ordering | Valid linear order of dependencies | Kahn's (+ cycle detection) or DFS post-order |
| Cycle detection | Does a cycle exist? | DFS (directed: 3-colour; undirected: parent check or UF) |
| MST | Min total edge weight spanning all vertices | Kruskal (sparse), Prim (dense) |
| SCC | Maximal mutually-reachable groups | Tarjan or Kosaraju |
| Bipartite check | Is the graph 2-colourable? | BFS 2-colouring |
| Bridge / articulation point | Critical edges or vertices | Tarjan's bridge algorithm |
| Euler path / circuit | Visit every edge exactly once | Hierholzer + degree check |
| Grid traversal / flood fill | Islands, regions, percolation | BFS or DFS on implicit grid graph |
| State-space shortest path | Shortest path with extra constraints | BFS/Dijkstra on (node, state) graph |
| Network flow / matching | Max flow, bipartite matching | Edmonds-Karp (out of scope for most LC problems) |
Problem Signal → Technique
| Signal in problem | Likely technique |
|---|---|
| "Shortest path", "minimum cost to reach" with weights | Dijkstra |
| "Shortest path" unweighted or uniform weight | BFS |
| "Negative weights", "detect negative cycle" | Bellman-Ford |
| "All pairs", "distance between every pair" | Floyd-Warshall |
| "Minimum spanning tree", "connect all nodes cheaply" | Kruskal / Prim |
| "Connected components", "number of islands" | DFS/BFS or Union-Find |
| "Course schedule", "task order", "dependency" | Topological sort + cycle detection |
| "Strongly connected", "mutually reachable" | Tarjan SCC |
| "Critical connection", "critical server" | Tarjan bridges |
| "Bipartite", "divide into two groups", "no odd cycle" | BFS 2-colouring |
| "Reconstruct itinerary", "Euler path" | Hierholzer |
| "At most k steps/stops" | State-space BFS with state = (node, k_remaining) |
| "Collect all keys", "visit all N nodes" | BFS with bitmask state |
| "Cheapest way with at most k stops" | Dijkstra/Bellman-Ford on (node, stops_used) |
| Edge weights are 0 or 1 | 0-1 BFS |
| Word transformation, sliding puzzle, lock combination | State-space BFS |
Common Patterns
| Pattern | When it applies | Mechanism |
|---|---|---|
| Build adjacency list from edge list | Almost every graph problem | for u, v in edges: adj[u].append(v) |
| Visited set to prevent revisiting | BFS/DFS on undirected or state graphs | if node in visited: continue; visited.add(node) |
| In-degree array for Kahn's | Topological sort / detect if DAG | Count incoming edges per vertex, decrement as edges "removed" |
| Multi-source BFS | "Distance to nearest X" | Seed queue with all sources at distance 0 |
| Grid → graph translation | Island/region problems | 4-directional offsets [(0,1),(0,-1),(1,0),(-1,0)] |
| Reverse graph | Problems asking reachability "backwards" (e.g., can all reach a single target) | Add edges in reverse direction, BFS/DFS from target |
| Union-Find for online connectivity | Dynamic edge additions, cycle detection in undirected | find + union; check find(u)==find(v) before adding |
| Lazy deletion in Dijkstra heap | Multiple updates to the same vertex | Store (dist, node); skip if popped dist > recorded dist |
| Condensation → DAG | Problems about SCC-level structure | Tarjan SCC + rebuild adjacency on SCC IDs |
| Virtual node for multi-source weighted | Weighted multi-source shortest path | Add super-source with 0-cost edges to all real sources |
Edge Cases & Pitfalls
-
Disconnected graphs — always iterate over all vertices as potential DFS/BFS roots; never assume a single traversal visits every vertex.
-
Forgetting to add reverse edge in undirected graphs — adding edge (u,v) without also adding (v,u) silently creates a directed graph, causing BFS/DFS to miss paths.
-
Directed vs. undirected in cycle detection — for directed graphs, reaching a visited vertex is not always a cycle; it's only a cycle if the visited vertex is currently on the DFS stack (grey). Using the undirected check on a directed graph produces false positives.
-
Stale entries in Dijkstra's heap — a vertex is pushed multiple times when its distance is updated. Without the
if d > dist[u]: continueguard, you process outdated entries and may relax edges from incorrect distances. -
Negative weights with Dijkstra — Dijkstra's greedy finalisation is incorrect when a shorter path can arrive later via a negative edge. Symptoms: intermittently wrong distances.
-
Self-loops — DFS cycle detection may trigger immediately on the first edge; add a check
if v == u: continuewhen appropriate, or handle self-loops before traversal. -
Modifying adjacency during Hierholzer —
adj[u].pop()consumes edges as you go; a separate copy of adj is needed if the original must be preserved. -
Euler path degree check — forgetting to check degree conditions before running Hierholzer produces a result on invalid inputs; always verify (0 or 2 odd-degree vertices for undirected, 1 source with out−in=1 and 1 sink with in−out=1 for directed).
-
Topological sort on a graph with cycles — Kahn's silently produces a partial order shorter than V; always assert
len(order) == Vto detect cycles. -
Integer overflow in Dijkstra distance initialisation — using
float('inf')avoids this in Python; in typed languages use a large but safe value (notINT_MAX, which overflows when added to a weight). -
SCCs in undirected graphs — the concept of SCC is only meaningful for directed graphs. For undirected, the equivalent is connected component.
-
DAG DP direction — if computing
dp[v]from predecessors, process in topological order (forward); if computing from successors, reverse topological order. Using the wrong direction reads uncomputed states.
Implementation Notes
- Python's
heapqis a min-heap only; negate weights for max-heap Dijkstra (e.g., furthest point). - Python's default recursion limit is 1000; call
sys.setrecursionlimit(...)or convert to iterative DFS for graphs with up to 10^5 nodes. collections.dequeis required for O(1) deque pops in 0-1 BFS — a plain list makespopleft()O(n).- When building an undirected graph from a directed edge list, add both directions:
adj[u].append(v); adj[v].append(u). - For weighted Dijkstra, store tuples
(cost, node)not(node, cost)so the heap sorts by cost automatically. - Union-Find's
rankarray must be initialised to 0 for all vertices;parent[i] = iinitialises each node as its own root. - In Tarjan's SCC, the
on_stackboolean array is separate from thevisitedset — a node can be visited (discovery time assigned) but no longer on the stack (already in a completed SCC). - Floyd-Warshall requires the intermediate vertex
kin the outermost loop; swapping k and i/j gives incorrect results.
Cross-Topic Relations
| Related topic | Connection |
|---|---|
| BFS | Primary traversal for unweighted shortest paths, level-order, multi-source; see breadth-first-search.md |
| DFS | Primary traversal for cycle detection, SCC, bridges, topological sort; see depth-first-search.md |
| Dynamic Programming | DP on DAGs; shortest paths recurrences; knapsack-style DP on graph state |
| Union-Find | Efficient connectivity queries; Kruskal's MST; online cycle detection |
| Heap / Priority Queue | Dijkstra's and Prim's implementation |
| Sorting | Kruskal's edge sorting; topological order |
| Matrix | Grids are implicit graphs; Floyd-Warshall uses a 2D matrix |
| Trees | Special case of graph (connected, acyclic, undirected); many tree algorithms derive from graph algorithms |
| Two Pointers | Rarely combined directly; occasionally in bipartite matching setups |
| Bit Manipulation | Bitmask DP on graphs (shortest path visiting all nodes; travelling salesman variants) |
Interview Reference
Must-Know Problems
Connectivity & Components
- Number of Islands (LeetCode 200) — grid BFS/DFS
- Number of Connected Components in an Undirected Graph (LC 323) — Union-Find
- Redundant Connection (LC 684) — Union-Find cycle detection
- Graph Valid Tree (LC 261) — connected + n−1 edges
Shortest Path
- Network Delay Time (LC 743) — Dijkstra single-source
- Cheapest Flights Within K Stops (LC 787) — Bellman-Ford / Dijkstra with state
- Path With Minimum Effort (LC 1631) — Dijkstra on grid
- Find the City With Smallest Number of Neighbors (LC 1334) — Floyd-Warshall
- Word Ladder (LC 127) — state-space BFS
Topological Sort
- Course Schedule (LC 207) — cycle detection via Kahn's
- Course Schedule II (LC 210) — Kahn's topological order
- Alien Dictionary (LC 269) — build DAG from constraints, Kahn's
- Longest Increasing Path in a Matrix (LC 329) — DAG DP
MST
- Min Cost to Connect All Points (LC 1584) — Kruskal or Prim
- Optimize Water Distribution in a Village (LC 1168) — virtual node + MST
Advanced / Hard
- Critical Connections in a Network (LC 1192) — Tarjan bridges
- Reconstruct Itinerary (LC 332) — Hierholzer Euler path
- Shortest Path Visiting All Nodes (LC 847) — BFS + bitmask state
- Strongly Connected Components — Tarjan / Kosaraju (common in contest problems)
- Bus Routes (LC 815) — virtual-node multi-source BFS
Additional LeetCode Coverage
- Keys and Rooms (LC 841)
- Find the Town Judge (LC 997)
- Minimum Number of Vertices to Reach All Nodes (LC 1557)
- Maximal Network Rank (LC 1615)
- Detonate the Maximum Bombs (LC 2101)
Clarification Questions to Ask
- Are edges directed or undirected?
- Can there be self-loops or parallel edges?
- Are edge weights guaranteed non-negative?
- Is the graph guaranteed to be connected?
- What is the definition of "path" — can vertices repeat? Edges?
- Are the node labels 0-indexed or 1-indexed?
- What should be returned if no path exists?
Common Interview Mistakes
- Using BFS for weighted shortest paths (BFS is only correct for unit weights).
- Not resetting the visited set when running multiple BFS/DFS from different sources in the same call.
- Forgetting that Union-Find needs explicit path compression and union by rank — naive union is O(n) per operation.
- Treating Dijkstra as correct on negative-weight graphs.
- Kahn's algorithm: not asserting
len(order) == Vto detect cycles. - Tarjan SCC: confusing the
visitedarray with theon_stackarray — they serve different roles. - Forgetting to handle disconnected graphs by looping over all vertices as starting points.
Typical Follow-Up Escalations
- "Now the graph has negative weights." → Switch from Dijkstra to Bellman-Ford; add negative-cycle detection.
- "Now you must find the path, not just the distance." → Track a
prevpointer array alongside distances. - "Now edges are added dynamically." → Use Union-Find for online connectivity; Dijkstra must rerun otherwise.
- "Now find the second-shortest path." → Relax each vertex up to twice in modified Dijkstra.
- "Can you reduce memory from O(V²) to O(V+E)?" → Switch from adjacency matrix to adjacency list.
- "What if the graph is very large and sparse?" → Ensure adjacency list representation; consider bidirectional BFS or A*.
Read Next
Start reading to get personalized recommendations
Explore Unread
Great job! You've read all available articles