Strings
Topic type: Data Structure (with heavy Technique-Based overlap) LeetCode problem count: 700+ — full depth required
Core Concepts
String — an immutable (Python, Java) or mutable (C++, C) sequence of characters stored contiguously; indexed 0-based; length accessible in O(1).
Substring — a contiguous slice s[i:j]; there are O(n²) distinct substrings of a length-n string.
Subsequence — a selection of characters preserving relative order but not necessarily contiguous; there are O(2ⁿ) subsequences.
Prefix / Suffix — prefix: s[0:i]; suffix: s[i:n]; a string is both a prefix and suffix of itself (proper prefix/suffix excludes the full string).
Palindrome — a string equal to its reverse; can be odd-length (center character) or even-length (no center).
Anagram — a rearrangement of exactly the same multiset of characters; two strings are anagrams iff their sorted forms are equal, or equivalently their character frequency maps are equal.
String hashing (polynomial rolling hash) — maps a string to an integer H = Σ s[i] * p^i mod M; enables O(1) substring comparison after O(n) preprocessing. Collision probability ≈ 1/M; use double hashing to reduce false positives.
Suffix array — sorted array of all suffix start indices; combined with the LCP array enables linear-time solutions to many substring problems.
LCP array — lcp[i] = length of longest common prefix between the suffix at sa[i] and sa[i-1]; built in O(n) via Kasai's algorithm after the suffix array is computed.
Z-array — z[i] = length of the longest substring starting at i that is also a prefix of the full string; built in O(n); central to pattern matching and period detection.
Failure function (KMP π-array) — π[i] = length of the longest proper prefix of s[0..i] that is also a suffix; drives O(n + m) pattern matching.
Border — a string that is both a proper prefix and proper suffix of another string; the failure function encodes all borders.
Period — smallest p such that s[i] = s[i+p] for all valid i; period = n - π[n-1] for the string whose failure function is computed.
Structural Invariants
| Property | Invariant | What breaks if violated |
|---|---|---|
| Immutability (Python/Java) | Any "modification" creates a new object | Repeated concatenation inside a loop is O(n²) due to new allocations |
| Null termination (C strings) | '\0' marks end; buffer must have room for it | Buffer overflow; strlen runs past buffer |
| Index validity | 0 ≤ i < len(s) | IndexError / undefined behavior |
| Hash collision assumption | Distinct strings may hash identically | False-positive substring matches; always verify on collision |
| Suffix array sortedness | sa[i]-th suffix < sa[i+1]-th suffix lexicographically | Binary search on SA gives wrong range |
| Z-array / π-array prefix identity | z[0] = n (by convention, or explicitly excluded); π[0] = 0 always | Incorrect match counts or infinite loops |
Internal Representation
Immutable string (Python str, Java String) — stored as a fixed char array; any mutation (concatenation, replace) allocates a new object. Use ''.join(list) or StringBuilder for repeated construction.
Mutable string (C++ std::string, Python bytearray) — supports in-place modification; preferred when building character-by-character.
Character array / list — Python: list(s) then ''.join(result); the standard pattern for in-place string problems.
Trie (prefix tree) — each edge is a character; enables O(m) prefix lookup where m = query length. See trie.md.
Suffix array (SA) + LCP array — two integer arrays of length n; total O(n) space; enables O(log n) or O(1) substring queries after O(n log n) or O(n) construction.
Suffix automaton (SAM) — a directed acyclic word graph; O(n) states and transitions; represents all substrings in O(n) space; supports O(m) substring occurrence counting.
Rolling hash array — two prefix arrays h[] and pw[] of length n+1; O(1) hash of any substring s[l:r] via (h[r] - h[l] * pw[r-l]) % M.
Types / Variants
| Variant | What it is | When to prefer |
|---|---|---|
| ASCII strings | Characters in [0, 127] or [0, 255] | Frequency arrays of size 26 or 128 are cheap; covers most interview problems |
| Unicode strings | Arbitrary codepoints; multi-byte encodings | When problem mentions emojis, Chinese characters, surrogate pairs |
| Binary strings | Characters are '0' and '1' only | Bit-manipulation intuition applies; XOR tricks useful |
| Palindromes | Reads same forwards and backwards | Manacher's for all palindromes; expand-around-center for single |
| Circular / rotated strings | Problem treats the string as a ring | Concatenate s + s; standard substring search in 2n-length string |
| Run-length encoded strings | "aabbc" → "a2b2c1" | Explicit decode only when unavoidable; operate on RLE form when possible |
Traversals & Access Patterns
Linear scan (left-to-right) — most common; processes each character once; O(n). Used for frequency counting, building prefix structures, simple parsing.
Two-pointer (opposite ends) — one pointer at start, one at end, converge; used for palindrome check, reversing, partition. O(n) time, O(1) space.
Sliding window — a window [l, r] expands rightward and contracts from the left when a constraint is violated; O(n) amortized. Used for longest/shortest substring with constraint.
See sliding-window.md for full coverage.
Prefix traversal with hash map — process left-to-right, store cumulative state (frequency map, XOR, count) in a hash map keyed by that state; lookup gives O(1) query over a range. Pattern: "number of substrings where property holds".
Suffix traversal (right-to-left) — useful when answers depend on what comes after the current position; palindrome DP, suffix-based DFS on trie.
Z-function traversal — a single left-to-right pass maintaining a window [l, r] of the rightmost prefix-match; each position processed in amortized O(1).
KMP automaton traversal — drives a state machine (failure function) over the text; each character either advances the state or falls back; O(n + m) total.
Key Algorithms
KMP (Knuth-Morris-Pratt) Pattern Matching
Finds all occurrences of pattern p in text t in O(n + m) time; O(m) space.
Key insight: the failure function π[i] encodes the longest border of p[0..i], so on a mismatch at position j we fall back to j = π[j-1] instead of restarting from 0.
π[0] = 0
k = 0
for i in range(1, m):
while k > 0 and p[i] != p[k]: k = π[k-1]
if p[i] == p[k]: k += 1
π[i] = k
Time: O(n + m). Space: O(m).
Rabin-Karp Rolling Hash
Finds pattern occurrences using hash comparison; O(n + m) average, O(nm) worst case.
Key insight: hash of s[i+1..i+m] can be computed from hash of s[i..i+m-1] by subtracting the leading character's contribution and appending the new character — O(1) per slide.
Use double hashing (two independent moduli) to make collision probability negligible (~10⁻¹⁸).
Time: O(n + m) average. Space: O(1).
Z-Algorithm
Builds the Z-array in O(n); solves pattern matching by constructing p + '$' + t and reading Z-values ≥ m.
Key insight: maintain the rightmost prefix-match window [l, r]; for position i, initialize z[i] = z[i - l] if i - l < z[l] (falls inside the window), then extend naively. Each character is extended at most once globally.
Time: O(n). Space: O(n).
Manacher's Algorithm
Finds all palindromic substrings and their radii in O(n) time.
Key insight: transform string to #a#b#a# (inserting separators) to unify odd/even cases; maintain a center c and right boundary r of the rightmost palindrome; use mirror symmetry to initialize p[i] ≥ min(p[mirror], r - i) then expand.
# p[i] = palindrome radius at transformed position i
p[i] = min(p[2*c - i], r - i) if i < r else 0
Time: O(n). Space: O(n).
Suffix Array Construction (SA-IS / prefix doubling)
Builds sorted array of all suffix indices.
Prefix doubling (O(n log n)): sort suffixes by their first 1 character, then 2, then 4, … doubling rank until all ranks are distinct. Each doubling step is a radix sort.
SA-IS (O(n)): classify each suffix as S-type or L-type; identify LMS suffixes; sort recursively; use induced sorting to extend.
For interviews, O(n log n) with the built sort is sufficient.
LCP Array via Kasai's Algorithm
Computes lcp[i] = LCP of sa[i] and sa[i-1] in O(n), given the suffix array.
Key insight: if the suffix starting at i has LCP k with its predecessor in SA, then the suffix starting at i+1 has LCP ≥ k-1 with its predecessor — this prevents the LCP value from increasing by more than 1 per step, guaranteeing O(n) total work.
Polynomial Rolling Hash (Preprocessing)
Builds prefix hash in O(n); answers substring hash queries in O(1).
h[0] = 0
for i, c in enumerate(s):
h[i+1] = (h[i] * BASE + ord(c)) % MOD
# hash of s[l:r]: (h[r] - h[l] * pw[r-l]) % MOD
Choice of BASE (31 or 131) and MOD (10⁹+7 and 10⁹+9 for double hashing).
Longest Common Subsequence (LCS)
DP on two strings; dp[i][j] = LCS length of s1[0..i-1] and s2[0..j-1].
See dynamic-programming.md for full coverage.
Edit Distance (Levenshtein)
dp[i][j] = minimum edits to transform s1[0..i-1] into s2[0..j-1]; recurrence: if characters match, dp[i-1][j-1]; else 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]).
See dynamic-programming.md for full coverage.
Advanced Techniques
Suffix Automaton (SAM)
Solves: counting distinct substrings, finding the number of occurrences of every substring, longest common substring of multiple strings.
Mechanism: an online algorithm that builds a minimal DAWG (directed acyclic word graph) in O(n) time and space. Each state represents an equivalence class of end-positions (endpos sets). Adding a character requires at most two new states and pointer updates.
When to prefer over suffix array: when you need occurrence counts for all substrings simultaneously, or when building the structure online (character by character). Suffix arrays are simpler to implement and sufficient for most interview problems.
Complexity: O(n) time and space to build; O(m) to query a pattern.
Aho-Corasick Automaton
Solves: searching for multiple patterns simultaneously in a text.
Mechanism: builds a trie of all patterns, then adds failure links (analogous to KMP π-array) via BFS. Processing the text runs the trie as a DFA; failure links redirect mismatches without backtracking.
When to prefer over running KMP for each pattern: when the total pattern length is large and patterns overlap; when you need all matches in a single O(n + Σ|patterns| + matches) pass.
Complexity: O(Σ|patterns|) build + O(n + matches) search.
String Hashing with Binary Search
Solves: longest palindromic prefix, longest common substring, checking if a pattern of length exactly k exists.
Mechanism: binary search on the answer length; O(1) hash comparison confirms equality. For longest common substring: binary search on length, collect all hashes of that length from both strings using a set.
When to prefer over SA/SAM: when only one specific length matters, or when the problem has a monotonic structure that binary search exploits. SA/SAM are better when all lengths are needed simultaneously.
Complexity: O(n log n) total for binary search + hashing.
Small-to-Large String Merging (DSU on strings)
Solves: aggregating character frequency maps across groups (e.g., after union-find operations on string indices).
Mechanism: always merge the smaller frequency map into the larger; each character participates in O(log n) merges.
When to prefer: when the problem requires maintaining per-group string statistics and groups are merged dynamically. Not useful for static substring queries.
Complexity: O(n log n) total merges.
Palindrome Partitioning with DP + Manacher's
Solves: minimum cuts for palindrome partitioning, counting palindromic partitions.
Mechanism: precompute the palindrome radius table using Manacher's in O(n); then the DP uses O(1) palindrome checks instead of O(n) per query, reducing overall complexity from O(n²) to O(n) or O(n log n) depending on the DP formulation.
When to prefer over naive DP: any palindrome partitioning problem where n > 10³. The naive O(n²) DP is sufficient for n ≤ 1000.
Problem Taxonomy
By Problem Category
| Category | What it asks | Key technique |
|---|---|---|
| Palindrome detection / construction | Is this string a palindrome? Find longest / count all | Expand-around-center (single), Manacher's (all), two-pointer check |
| Pattern matching | Find first/all occurrences of pattern in text | KMP, Z-algorithm, Rabin-Karp |
| Anagram / permutation in string | Does a permutation of pattern appear in text? | Sliding window + frequency map; character count delta |
| Substring with constraint | Longest/shortest substring with at most/exactly k distinct chars, no repeats, sum = target | Sliding window (at most k); two-pass (exactly k = at-most-k minus at-most-(k-1)) |
| Subsequence problems | LCS, edit distance, distinct subsequences, wildcard matching | DP (standard 2D or optimized) |
| String construction / transformation | Rearrange to meet constraint, decode, compress, encode | Greedy, simulation, frequency counting |
| String rotation / circular | Is s2 a rotation of s1? Period of string | Concatenate s+s, search; period = n - π[n-1] |
| Bracket / parenthesis matching | Valid, minimum remove, score | Stack, greedy counting |
| Parsing / tokenization | Evaluate expression, split by delimiter, reverse words | Stack-based parsing, two-pointer |
| Multiple pattern search | Find all words from a dictionary in text | Trie + DFS, Aho-Corasick |
| Distinct / lexicographic substrings | Count distinct substrings, k-th lexicographically smallest | Suffix array + LCP, SAM |
| String DP | Interleaving strings, regular expression match, scramble string | DP with memoization |
Problem Signal → Technique
| Signal in problem | Likely technique |
|---|---|
| "find pattern in text", "first occurrence" | KMP or Z-algorithm |
| "all occurrences", "count occurrences" | KMP + count, Rabin-Karp |
| "permutation of pattern exists in window" | Sliding window + frequency delta array |
| "longest substring without repeating" | Sliding window + last-seen index map |
| "longest substring with at most k distinct" | Sliding window with frequency map |
| "exactly k distinct" | (at most k) − (at most k−1) |
| "longest palindromic substring" | Manacher's or expand-around-center |
| "minimum cuts palindrome partition" | DP + Manacher's precomputation |
| "is rotation of" | Concatenate s+s, substring search |
| "period / smallest repeating unit" | KMP π-array: period = n − π[n−1] |
| "count distinct substrings" | Suffix array + LCP: Σ(n − sa[i] − lcp[i]) |
| "k-th lexicographically smallest substring" | Suffix array traversal |
| "edit distance / minimum operations" | DP (Levenshtein) |
| "wildcard or regex matching" | DP with '?' and '*' handling |
| "words in text from dictionary" | Trie + DFS, or Aho-Corasick |
| "group anagrams" | Sort-as-key or frequency-tuple-as-key hash map |
| "valid parentheses / brackets" | Stack or greedy counter |
| "score of parentheses" | Stack tracking depth |
| "decode / encode run-length" | Linear scan with counter |
| "longest common substring (two strings)" | Suffix array (merge both) or hashing + binary search |
| "longest common subsequence" | DP — see dynamic-programming.md |
Common Patterns
| Pattern | When it applies | Mechanism |
|---|---|---|
| Fixed-size sliding window | Count/check property over all windows of size k | Maintain running state; add right char, remove left char each step |
| Variable-size sliding window | Longest/shortest window satisfying a constraint | Expand right unconditionally; shrink left while constraint is violated |
| Two-pointer palindrome check | Check if a string is a palindrome | l, r = 0, n-1; advance while s[l] == s[r]; fail on mismatch |
| Frequency delta array | Check if window is anagram of pattern | Maintain count[26]; track how many entries equal zero; O(1) updates |
| Concatenate for rotation | Detect rotation, find period visually | (s + s)[1:-1] contains all non-trivial rotations |
| Build from character list | Mutate a string | arr = list(s); ...; return ''.join(arr) |
| Sort chars as key | Anagram grouping | key = tuple(sorted(word)) or ''.join(sorted(word)) |
| Frequency tuple as key | Anagram grouping (no sort) | key = tuple(freq[c] for c in 'abcdefghijklmnopqrstuvwxyz') |
| Stack for nesting | Bracket matching, expression evaluation | Push on open; compare/pop on close; stack size = nesting depth |
| Prefix hash map | Count substrings with property (e.g., equal 0/1 counts) | Map cumulative state → first/count of positions with that state |
| String as state in BFS | Word ladder, minimum transformations | Each node is a string; neighbors differ by one edit; visited set of strings |
| Rolling hash comparison | Check substring equality in O(1) | Precompute hash arrays; compare hashes; re-verify on collision |
Edge Cases & Pitfalls
-
Empty string: most algorithms assume
n ≥ 1; initialize answer variables appropriately (max_len = 0, not1); KMP with empty pattern matches everywhere — decide how to handle. -
Single character: palindrome by definition; no "border" (proper prefix/suffix must be shorter); sliding window with
l = r = 0should work without a shrink step. -
All identical characters (
"aaaa"): KMP failure function reaches maximum values; suffix array has LCP = n−1 between consecutive entries; Manacher's radius at center = n//2; sliding window with "no repeating chars" collapses to window size 1. -
Repeating patterns and period:
n - π[n-1]is the candidate period; the string has periodpiffpdividesn; check divisibility before claiming the string is a repetition. -
Off-by-one in substring indices: Python
s[l:r]is exclusive on the right; length =r - l; when converting between (l, length) and (l, r) forms, be consistent. Manacher's transformed index to original:(i - 1) // 2for center,(i - radius) // 2for left boundary. -
Hash collision: Rabin-Karp or rolling hash will produce false positives without re-verification. Always do a character-by-character check after a hash match when correctness matters.
-
Choosing MOD and BASE: MOD must be prime; BASE must be larger than the alphabet size; avoid
BASE = 26(too many collisions). Safe choices:BASE = 131,MOD = 10^9 + 7and10^9 + 9for double hashing. -
Python string immutability in loops:
result += cinside a loop is O(n²) total. Uselist+''.join(). -
Unicode / multi-byte characters:
len(s)in Python counts codepoints, not bytes; slicing works on codepoints; be aware when the problem involves emojis or non-ASCII characters. -
KMP failure function self-loop:
π[0]is always 0 — do not recurse on it or you'll infinite-loop in the fallback chain. -
Suffix array vs. suffix array with LCP: many problems require both; forgetting to build the LCP array after the suffix array is a common incomplete implementation.
-
Palindrome expand-around-center double call: must expand for both odd (center
i) and even (center betweeniandi+1) cases; missing one case produces wrong answer for half of inputs. -
Sliding window "exactly k" requires the two-pass trick:
exactly(k) = atMost(k) - atMost(k-1). Trying to handle "exactly" in a single-pass window is error-prone.
Implementation Notes
- Python
stris immutable; any in-place modification problem requireslist(s)→ modify →''.join(). ord(c) - ord('a')maps lowercase letters to 0–25; useord(c) - ord('A')for uppercase; never mix without explicit handling.- Python
collections.Countersubtraction only keeps positive counts;Counter(s) - Counter(t)drops characters that t has more of — not the same as checking equality. ''.join(reversed(s))creates a reversed string in O(n);s[::-1]is equivalent and idiomatic.- Suffix array construction from scratch is ~50 lines; in competitions/interviews, clarify if library use is allowed; for interviews, describe the algorithm and write only the key steps.
- Manacher's transformed-index arithmetic is error-prone; test with
"a"(length 1) and"ab"(length 2) before submitting. remodule in Python can solve some pattern-matching problems but has worst-case exponential time on pathological inputs with backtracking ((a+)+bstyle); avoid for competitive problems.- Rolling hash with negative values:
(h[r] - h[l] * pw[r-l]) % MODcan be negative in Python if intermediate results underflow — add% MODagain or use((... ) % MOD + MOD) % MOD. - For sliding window on character frequency, a size-26
intarray is faster than adictand avoids key-not-found issues.
Cross-Topic Relations
| Related topic | Specific connection |
|---|---|
| Two Pointers | Palindrome checking, string reversal, partition problems on strings |
| Sliding Window | Substring constraint problems (longest with k distinct, no repeats) |
| Dynamic Programming | LCS, edit distance, palindrome partition, distinct subsequences, wildcard match |
| Trie | Multi-pattern search, prefix queries, word search in grid; see trie.md |
| Hash Map / Hash Set | Anagram grouping, prefix state tracking, rolling hash collision handling |
| Sorting | Anagram keys, suffix array construction |
| Stack | Bracket matching, expression parsing, monotonic string problems |
| BFS / DFS | Word ladder (strings as BFS states), word search in grid |
| Bit Manipulation | Binary strings, XOR-based deduplication, bitmask over alphabet subsets |
| Greedy | String rearrangement, minimum deletions, task scheduler analogs |
| Divide & Conquer | Merge sort on strings, suffix array prefix-doubling, large palindrome counting |
Interview Reference
Must-Know Problems
Palindrome
- Valid Palindrome (LC 125) — two-pointer with skip non-alphanumeric
- Longest Palindromic Substring (LC 5) — expand-around-center or Manacher's
- Palindromic Substrings (LC 647) — count all, same approach
- Palindrome Partitioning II (LC 132) — DP + Manacher's precomputation
- Valid Palindrome II (LC 680) — at most one deletion; two helper checks
Sliding Window on Strings
- Longest Substring Without Repeating Characters (LC 3) — sliding window + last-seen map
- Minimum Window Substring (LC 76) — sliding window + frequency delta
- Permutation in String (LC 567) — fixed-size window + frequency match
- Find All Anagrams in a String (LC 438) — same pattern as LC 567 with output collection
- Longest Substring with At Most K Distinct Characters (LC 340) — variable window + map size
Pattern Matching / Period
- Implement strStr() (LC 28) — KMP
- Repeated Substring Pattern (LC 459) —
(s + s)[1:-1]containssiff repeating; orn % (n - π[n-1]) == 0 - Shortest Palindrome (LC 214) — KMP on
s + '#' + reverse(s)
Anagram / Frequency
- Group Anagrams (LC 49) — sort-as-key grouping
- Valid Anagram (LC 242) — frequency comparison
- Find the Difference (LC 389) — XOR trick or frequency delta
Substring / Subsequence DP
- Longest Common Subsequence (LC 1143)
- Edit Distance (LC 72)
- Distinct Subsequences (LC 115)
- Interleaving String (LC 97)
- Wildcard Matching (LC 44) / Regular Expression Matching (LC 10)
Bracket / Parsing
- Valid Parentheses (LC 20) — stack
- Longest Valid Parentheses (LC 32) — stack with indices or DP
- Score of Parentheses (LC 856) — stack tracking depth
- Basic Calculator I/II (LC 224/227) — stack-based expression eval
- Decode String (LC 394) — stack for nested brackets
Advanced / Hard
- Minimum Window Substring (LC 76)
- Substring with Concatenation of All Words (LC 30) — fixed-size sliding window of word units
- Shortest Palindrome (LC 214) — KMP failure function on concatenation
- Longest Duplicate Substring (LC 1044) — binary search + rolling hash
- Number of Distinct Substrings — suffix array + LCP
- Word Ladder II (LC 126) — BFS on string states + backtracking
Additional LeetCode Coverage
- Fraction to Recurring Decimal (LC 166)
- Reverse String (LC 344)
- Reverse Vowels of a String (LC 345)
- Ransom Note (LC 383)
- Reverse Words in a String III (LC 557)
- Minimum Index Sum of Two Lists (LC 599)
- Dota2 Senate (LC 649)
- To Lower Case (LC 709)
- Jewels and Stones (LC 771)
- Guess the Word (LC 843)
- Remove Outermost Parentheses (LC 1021)
- Greatest Common Divisor of Strings (LC 1071)
- Parsing A Boolean Expression (LC 1106)
- Maximum Number of Balloons (LC 1189)
- Maximum Number of Vowels in a Substring of Given Length (LC 1456)
- Make The String Great (LC 1544)
- Maximum Nesting Depth of the Parentheses (LC 1614)
- Determine if Two Strings Are Close (LC 1657)
- Merge Strings Alternately (LC 1768)
- Sum of Beauty of All Substrings (LC 1781)
- Check if the Sentence Is Pangram (LC 1832)
- Largest Odd Number in String (LC 1903)
- Query Kth Smallest Trimmed Number (LC 2343)
- Removing Stars From a String (LC 2390)
- Optimal Partition of String (LC 2405)
- Zigzag Conversion (LC 6)
- String to Integer (atoi) (LC 8)
- Integer to Roman (LC 12)
- Roman to Integer (LC 13)
- Length of Last Word (LC 58)
- Compare Version Numbers (LC 165)
- Excel Sheet Column Number (LC 171)
Clarification Questions to Ask
- Character set: lowercase only? lowercase + uppercase? alphanumeric? arbitrary Unicode?
- Definition of "substring" vs. "subsequence": always confirm — they differ fundamentally.
- Empty string: is it a valid input? Is it a valid answer?
- Uniqueness: are all characters distinct? Are strings in the array unique?
- Case sensitivity: "Abc" == "abc"?
- Mutability: can we modify the input string in place, or return a new one?
- Multiple occurrences: find the first, all, or count of occurrences?
- What counts as a "palindrome": must length be ≥ 1? Is a single character always valid?
Common Interview Mistakes
- Using
result += charin a loop (O(n²)) instead of building a list and joining at the end. - Forgetting to handle both odd- and even-length palindromes in expand-around-center.
- Implementing "at most k distinct" and claiming it solves "exactly k distinct" — the two-pass reduction is required.
- In KMP, recursing
π[0]in the fallback loop causing an infinite loop; the loop must stop at 0. - Confusing
s[l:r](exclusive right) withs[l:r+1]— consistently pick a convention and document it. - Hash-only comparison without re-verification — this causes wrong answers on collision inputs.
- Forgetting that Python's
Counter(a) == Counter(b)works for anagram checks, butCounter(a) - Counter(b)is not symmetric. - Treating period detection as
n % (n - π[n-1]) == 0without also checkingn - π[n-1] < n(the string must actually contain more than one copy).
Typical Follow-Up Escalations
- "Now find the k-th lexicographically smallest substring" → suffix array traversal using LCP to skip duplicates.
- "Now do it with multiple patterns" → Aho-Corasick instead of repeated KMP.
- "What if the string is a stream (online)?" → rolling hash window; KMP automaton state is maintained incrementally.
- "What if strings can be up to 10⁶ characters?" → O(n) algorithms required; KMP/Z for matching; Manacher's for palindromes; SA-IS for suffix array.
- "Can you do substring equality in O(1)?" → rolling hash preprocessing; O(n) build, O(1) query.
- "Reduce space to O(1)" → for palindrome check, two-pointer; for pattern matching, only the π-array is needed (O(m)); rolling window approaches.
Read Next
Start reading to get personalized recommendations
Explore Unread
Great job! You've read all available articles