Memoization
Topic type: Technique-Based
LeetCode tag: Memoization (~120 problems)
See dynamic-programming.md for full DP theory: optimal substructure, subproblem DAGs, bottom-up formulation, and DP variant taxonomy.
See recursion.md for recursion fundamentals: base cases, recurrence structure, and stack mechanics.
Core Concepts
Memoization — an optimization where each unique set of arguments to a recursive function is computed exactly once; subsequent calls return the cached result immediately. Converts exponential tree recursion to polynomial time when subproblems overlap.
Overlapping subproblems — the same (args) tuple recurs across multiple call branches. Memoization is only beneficial when this happens; pure divide-and-conquer (disjoint subproblems, e.g., merge sort) gains nothing.
Cache key — the hashable representation of the complete function state. A correct key captures everything that determines the return value; any omission causes incorrect cache hits, any addition causes unnecessary misses.
Top-down evaluation — subproblems are solved lazily, driven by which states the recursion actually visits. Opposite of bottom-up (tabulation), which fills all states in dependency order.
Reachable state space — the set of (args) tuples actually visited during a run. Top-down only pays for reachable states; bottom-up pays for the full table. When the state space is sparse (e.g., digit DP, bitmask DP with pruning), top-down is cheaper.
Subproblem independence — each unique key's result must be a pure function of that key alone; if the return value depends on mutable global state outside the key, caching is incorrect.
Types / Variants
| Variant | Mechanism | When to prefer |
|---|---|---|
@cache / @lru_cache(None) | Python decorator wraps any function; key = all args as a tuple | Default choice in Python; zero boilerplate |
| Manual dict cache | memo = {} passed or closed over; if key in memo: return memo[key] | Need finer control (e.g., custom key, clearing, cross-function sharing) |
| Class-level cache | self._memo dict on an object | When memoised function is a method and object state is part of the key |
| Multi-level cache | Nested dicts or compound tuple keys | State has multiple independent dimensions (e.g., (i, j, remaining)) |
| Iterative + explicit stack | Convert recursion to iteration with a stack; fill cache as results become available | Recursion depth exceeds ~1000; Python stack limit is a hard constraint |
Key Algorithms
Top-Down DP via Memoization
Translate the recurrence relation directly into a recursive function, then cache. The algorithm structure is: check cache → base case → recurse and compute → store in cache → return.
@cache
def dp(i, j):
if base_case(i, j): return 0
return min(dp(i-1, j), dp(i, j-1)) + cost[i][j]
Time equals O(unique states × branching factor); space equals O(unique states) for the cache plus O(depth) for the call stack.
Cache Key Construction for Complex State
When state includes a list or set, convert to a tuple or frozenset before using as a key. Hashing a mutable object by identity (not value) causes incorrect misses on logically identical states.
@cache
def solve(i, remaining: tuple): # frozenset → frozenset, list → tuple
...
Interval DP (Top-Down)
Solve dp(l, r) by trying all split points k in [l, r). State space is O(n²); each state costs O(n) to compute → O(n³) total. Prefer top-down when not all (l, r) pairs are reachable.
Digit DP (Top-Down)
State: (position, tight, leading_zero, accumulated_count). The tight flag restricts digit choices to ≤ the corresponding digit of the limit. Memoization caches (position, tight, ...) tuples; the non-tight subtree is reused across multiple prefixes.
Bitmask DP (Top-Down)
State: (mask, last) where mask is a bitmask of visited items. Reachable states ≈ 2ⁿ × n; suits n ≤ 20. Top-down naturally prunes unreachable masks and is often simpler to write than bottom-up for irregular transitions.
Advanced Techniques
Broken-Profile / Rolling State Memoization
Solves: Grid/tiling problems where state is the partial configuration of a column boundary.
Mechanism: State encodes the "profile" (a bitmask or tuple) of how the current boundary is filled; recursion advances one cell at a time.
When over simpler alternatives: Standard 2D DP can't express non-rectangular dependency shapes; use this when the boundary state has O(2^width) configurations but n×m is large.
Complexity: O(nm · 2^width).
Memoization with Continuation Passing (Re-rooting)
Solves: Tree problems where the answer at each node depends on both subtree and parent information (e.g., sum of distances to all nodes).
Mechanism: First DFS memoises down-pass values; second DFS propagates up-pass values by inverting the recurrence.
When over simpler alternatives: Pure top-down from the root can't access ancestor info without re-rooting, which would be O(n²). Use when the answer needs the full tree context at every node.
Complexity: O(n).
Lazy Memoization with Sentinel Values
Solves: State spaces where computing "no solution" is expensive and you want to short-circuit.
Mechanism: Initialize cache to a sentinel (e.g., None or float('inf')); check sentinel in cache lookup to distinguish "not computed" from "computed as infeasible."
When over simpler alternatives: @cache can't distinguish a cached None return from "not yet computed" without a separate seen-set. Use manual dict when the function can legitimately return None.
Problem Taxonomy
By Problem Category
| Category | What it asks | Key technique |
|---|---|---|
| Count paths / ways | How many ways to reach target | dp(pos, remaining) counting recurrence |
| Optimal cost / value | Min/max cost over choices | dp(i, j) with min/max over transitions |
| String matching / edit | Transform or align two strings | dp(i, j) over two string indices |
| Interval problems | Best partition / merge of a range | dp(l, r) with split-point enumeration |
| Subset / assignment | Assign items to groups | dp(i, mask) bitmask state |
| Game theory | Optimal play | dp(state) returning win/loss or score |
| Digit counting | Count numbers with property up to N | dp(pos, tight, ...) digit DP |
| Tree / graph DP | Optimal structure on a tree | dp(node, parent) or re-rooting |
Problem Signal → Technique
| Signal in problem | Likely technique |
|---|---|
| "Number of ways to …" | Counting recurrence; dp(pos, ...) |
| "Minimum/maximum cost to …" | Optimization recurrence; dp(i, j) |
| "Can you reach …" | Boolean memoization; return True/False |
| Two strings, comparing characters | 2D dp(i, j) on string indices |
| Subarray / subrange, combine adjacent | Interval DP dp(l, r) |
| Subset of items, n ≤ 20 | Bitmask DP dp(mask) |
| Digits of a number, "count integers ≤ N" | Digit DP with tight flag |
| Game, two players alternate | Game DP, minimax |
| "Return to any node in a tree" / sum-of-distances | Re-rooting DP |
| Recursion gives TLE / time limit | Add @cache; identify overlapping subproblems |
Common Patterns
| Pattern | When it applies | Mechanism |
|---|---|---|
| State compression via tuple | State includes variable-length sequence | Convert list/set to sorted tuple before caching |
| Freeze mutable argument | Input is a list passed recursively | tuple(lst) or frozenset(s) as key |
| Pass indices, not slices | Function receives a subarray | Use (l, r) indices into a global array; avoid slicing (copies break equality) |
| Explicit seen parameter | Visited set in graph traversal must be part of state | frozenset(visited) in key; only feasible for small graphs |
| Default-argument cache injection | Avoid global variable for cache | def f(i, memo={}): — but beware of shared state across test cases |
@cache + sys.setrecursionlimit | Deep recursion in Python | Always pair these; set limit before calling the top-level function |
| Clear cache between test cases | Reusing the function in multiple test runs | f.cache_clear() after each test; or use a class with instance-level cache |
Edge Cases & Pitfalls
-
Mutable key argument — passing a list directly to a
@cache-decorated function raisesTypeError: unhashable type: 'list'. Convert to tuple first;@cacherequires all arguments to be hashable. -
Missing state in key — if a variable affects the return value but isn't in the function signature (e.g., a global index used inside the function), caching by
(i, j)alone returns stale results when that hidden variable changes. All varying state must be explicit in the key. -
Shared cache across calls —
@cacheon a module-level function persists across multiple test-case invocations. Call.cache_clear()or define the function inside the problem method to get a fresh cache per call. -
Recursion depth exceeded — Python's default limit is 1000. Deep memoization on inputs of size 500+ will crash. Add
sys.setrecursionlimit(200000)before the call, or convert to iterative bottom-up. -
Returning
Noneas a valid value —if key in memo: return memo[key]is the safe check. Avoidif memo.get(key):which treats a cached0orFalseas a cache miss. -
Off-by-one in base cases — base cases must be checked before any cache lookup to avoid storing incorrect results. Placing them after the cache check can cause a cache hit that skips the base case on re-entry.
-
Counting vs. existence conflation — a boolean memoization (
True/False) is distinct from a count memoization (int). Using the wrong return type silently produces wrong answers; the cache hits will propagate the error. -
Interval DP asymmetry —
dp(l, r)withl > rmust be handled explicitly (typically return 0 or the identity). Forgetting this base case causes infinite recursion when splits produce empty intervals.
Implementation Notes
functools.cache(Python 3.9+) is identical tolru_cache(maxsize=None)and is the preferred spelling.@cachedoes not limit cache size; for memory-constrained environments use@lru_cache(maxsize=N)where N is tuned to the problem's state space.- Nested functions decorated with
@cachecreate a new cache per outer-function call, which is usually desirable for isolation but adds overhead if the outer function is called in a tight loop. - When all arguments are integers bounded by the input size, a 2D/3D list (bottom-up table) is faster than a dict cache (hash overhead per lookup); switch to tabulation for tight time limits.
tuple(lst)is O(n); if key construction dominates runtime, pre-compute keys or redesign state to use integers.- In Python,
@cacheon a method caches withselfin the key — every distinct object gets its own cache, which is usually correct but can leak memory if many objects are created.
Cross-Topic Relations
| Related topic | Specific connection |
|---|---|
| dynamic-programming.md | Memoization is the top-down formulation of DP; every memoized solution has a bottom-up equivalent |
| recursion.md | Memoization requires a working recursive solution first; the recurrence structure drives state design |
| backtracking.md | Backtracking explores all paths and undoes state; memoization assumes subproblem results are reusable (no undo needed). Cannot memoize when state includes the path taken |
| divide-and-conquer.md | D&C has disjoint subproblems — memoization adds no benefit; overlap is the key differentiator |
| bitmask.md | Bitmask DP uses memoization with a bitmask dimension; frequently combined for subset/assignment problems |
| graph-theory.md | Memoization on a DAG is equivalent to bottom-up DP in topological order; cycle detection is required before memoizing on a general graph |
Interview Reference
Must-Know Problems
1D state
- Climbing Stairs (LeetCode 70) — classic overlapping subproblem introduction
- House Robber (198) — adjacent constraint
- Coin Change (322) — unbounded knapsack
- Word Break (139) — string segmentation
2D state — two sequences
- Longest Common Subsequence (1143)
- Edit Distance (72)
- Regular Expression Matching (10) — null/wildcard transitions are the non-obvious base cases
2D state — grid
- Unique Paths (62) / Unique Paths II (63)
- Minimum Path Sum (64)
- Cherry Pickup II (1463) — two simultaneous walkers, one 3D state
Interval DP
- Burst Balloons (312) — "what if I pop this last?" framing
- Strange Printer (664)
- Palindrome Partitioning II (132)
Bitmask DP
- Shortest Path Visiting All Nodes (847)
- Partition to K Equal Sum Subsets (698)
Digit DP
- Non-negative Integers without Consecutive Ones (600)
- Count Numbers with Unique Digits (357) — simpler entry point
Tree DP
- House Robber III (337) — return two values per node (rob vs. skip)
- Binary Tree Cameras (968)
Clarification Questions to Ask
- "Are there negative values or cycles in the graph?" — determines whether memoization is safe (requires a DAG or bounded state).
- "Can the same item / character be reused?" — distinguishes bounded from unbounded knapsack; changes state transitions.
- "What is the range of inputs?" — determines whether a dict cache or array-based table is appropriate.
- "Is the answer a count or a yes/no?" — return type affects how base cases and transitions are written.
- "Does 'path' mean any route or a simple (non-revisiting) path?" — simple paths on a general graph cannot be memoized without tracking visited nodes, blowing up state space.
Common Interview Mistakes
- Writing the recursion correctly but forgetting
@cache, producing O(2ⁿ) time and TLE. - Putting mutable objects (lists, sets) into the cache key without converting to a hashable type.
- Defining base cases inside a branch that's short-circuited by the cache check, so the base case never fires on re-entry.
- Not clearing the cache between test cases when the decorated function is defined at module scope.
- Conflating "memoization is always better" — for dense state spaces with all states reachable, bottom-up tabulation avoids recursion overhead and stack limits.
Typical Follow-Up Escalations
- "Now optimize space" → identify which previous states are still needed; convert to rolling array / bottom-up with reduced dimensions.
- "The values are too large for an array (n = 10⁹)" → switch to a dict-based cache; states are sparse so array indexing is infeasible.
- "Can you do it iteratively?" → build the bottom-up table in topological order of the subproblem DAG.
- "What if there are cycles in the graph?" → memoization alone is unsafe; need cycle detection (visited set) or reformulate as shortest-path (Bellman-Ford / Dijkstra).
- "Handle multiple test cases efficiently" → restructure so the cache is built once and queried per test case (offline processing), rather than cleared and rebuilt each time.
Read Next
Start reading to get personalized recommendations
Explore Unread
Great job! You've read all available articles