Suffix Array

Type: Data Structure LeetCode problems: ~20–30 (< 50)

Core Concepts

Suffix — a substring starting at position i and ending at the last character; string s of length n has exactly n suffixes.

Suffix Array (SA) — integer array of length n where SA[i] is the starting index of the i-th lexicographically smallest suffix; encodes all suffixes in sorted order without storing them explicitly.

Rank Array (RA / ISA) — inverse of SA: RA[SA[i]] = i, so RA[j] is the sorted rank of the suffix starting at position j; enables O(1) suffix comparison.

LCP Array — integer array where LCP[i] = length of the longest common prefix between SA[i-1] and SA[i] for i ≥ 1; LCP[0] is undefined (0 by convention). Critical for substring queries.

LCP between arbitrary suffixes — lcp(SA[i], SA[j]) = min(LCP[i+1 .. j]) when i < j; this is a range minimum query on the LCP array and is the key operation enabling most advanced applications.

Suffix Tree — a compressed trie of all suffixes that encodes the same information as SA + LCP but with higher constant factors. SA + LCP is preferred in practice.

See trie.md for trie background.

Suffix Automaton (SAM) — a DAG of states representing all substrings; supports online (incremental) construction unlike SA.

See suffix-automaton.md for full coverage.

Structural Invariants

Property	Invariant	What breaks if violated
Sorted order	`s[SA[0]:] < s[SA[1]:] < … < s[SA[n-1]:]` lexicographically	Binary search for pattern matching returns a wrong range
LCP consistency	`LCP[i]` equals the exact shared prefix length of `s[SA[i-1]:]` and `s[SA[i]:]`	All LCP-based queries return wrong answers
Rank is inverse of SA	`RA[SA[i]] = i` for all `i`	O(1) rank lookups are wrong; prefix doubling produces incorrect order
Completeness	Every index `0..n-1` appears exactly once in SA	Some suffixes uncounted; occurrence queries miss entries

Internal Representation

SA: integer array of length n storing starting indices. Storing suffixes explicitly would cost O(n²) space; SA uses O(n).

LCP array: integer array of length n (LCP[0] unused). Always built alongside SA using Kasai's algorithm.

Rank array: integer array of length n; maintained as a working buffer during construction, then rebuilt from the final SA by a single pass.

Sparse table on LCP: O(n log n) preprocessing for O(1) range minimum queries on LCP, needed for arbitrary lcp(i, j) queries.

Construction only retains SA and LCP as outputs; the rank array is a construction artifact rebuilt on demand.

Types / Variants

Variant	Time	Space	Use when
Prefix Doubling (Manber-Myers)	O(n log n) with radix sort	O(n)	Default choice; straightforward to implement and fast in practice
DC3 / Skew	O(n)	O(n)	Theoretical O(n) needed; complex, almost never used in interviews
SA-IS	O(n)	O(n)	Best practical O(n) implementation; used in `libdivsufsort` and CP libraries

Prefix doubling is the standard choice for interviews and most competitive programming.

Topic-Specific Operations

Construction — Prefix Doubling Sort suffixes by doubling the comparison key length each round. Round 0: rank by single character. Round k: sort by (rank[i], rank[i + 2^k]). After log n rounds all ranks are unique. Use radix sort per round for O(n log n) total; comparison sort gives O(n log² n).

LCP Array Construction — Kasai's Algorithm Given SA and RA, compute LCP in O(n). Key insight: if lcp(SA[RA[i]-1], SA[RA[i]]) = h, then lcp(SA[RA[i+1]-1], SA[RA[i+1]]) ≥ h-1. Walk suffixes in string order; reuse the previous LCP value minus 1 as a starting point, amortizing total work to O(n).

Pattern Matching Binary search on SA for the leftmost and rightmost position where pattern p is a prefix of s[SA[k]:]. All occ occurrences are at indices SA[lo], SA[lo+1], …, SA[hi-1]. Time: O(m log n) where m = len(p).

Arbitrary LCP Query lcp(i, j) = RMQ(LCP, RA[i]+1, RA[j]) (assuming RA[i] < RA[j]). Requires a sparse table on the LCP array. Time: O(1) per query after O(n log n) preprocessing.

Number of Distinct Substrings Total substrings = n*(n+1)/2. Each LCP[i] counts how many substrings of SA[i] are already represented by SA[i-1]. Formula: n*(n+1)/2 − sum(LCP[1..n-1]).

Key Algorithms

Prefix Doubling (Manber-Myers)

Iteratively refines suffix ranks using the observation that a length-2^k comparison can be assembled from two length-2^(k-1) ranks. After log n rounds, ranks uniquely identify each suffix and SA is derived directly.

O(n log n) with radix sort; the invariant after round k is that rank[i] reflects the sorted order of s[i : i + 2^k]

Key line: sort by pair (rank[i], rank[i + gap] if i + gap < n else -1), then reassign integer ranks by comparing adjacent pairs.

Kasai's LCP Algorithm

Constructs the LCP array in O(n) by iterating over suffixes in string order rather than sorted order. Starting LCP from the previous iteration, decrement by at most 1 each step, so total increments across all iterations ≤ n.

O(n) time; run immediately after building SA

h = 0
for i in range(n):
    if ra[i] > 0:
        j = sa[ra[i] - 1]
        while i + h < n and j + h < n and s[i+h] == s[j+h]:
            h += 1
        lcp[ra[i]] = h
        if h: h -= 1

Longest Common Substring of Two Strings

Concatenate s1 + sep + s2 where sep is a character below both alphabets (e.g., chr(0)). Build SA + LCP on the combined string. Scan LCP array: the maximum LCP[i] where SA[i-1] and SA[i] come from different original strings is the answer.

O((n+m) log(n+m)) construction + O(n+m) scan

Longest Repeated Substring

Maximum value in the LCP array after construction. The substring is s[SA[i] : SA[i] + LCP[i]] at the maximizing index.

O(n) after SA + LCP construction

Problem Taxonomy

By Problem Category

Category	What it asks	Key technique
Pattern search	All occurrences of pattern p	Binary search on SA
Distinct substrings	Count unique substrings	`n*(n+1)/2 − sum(LCP)`
Longest repeated substring	Longest substring appearing ≥ 2 times	Max of LCP array
Longest common substring	Longest shared substring of 2+ strings	Concatenate with separators, scan LCP
Lexicographic k-th substring	Find the k-th distinct substring	Walk SA using `(n − SA[i]) − LCP[i]` new-substring counts
Suffix comparison	Compare two suffixes in O(1)	RA lookup; ties impossible since all suffixes are distinct
Shortest unique substring	Shortest substring appearing exactly once	SA + LCP; for each suffix find min gap to both neighbors

Problem Signal → Technique

Signal in problem statement	Likely technique
"all occurrences of pattern"	Binary search on SA
"number of distinct substrings"	`n*(n+1)/2 − sum(LCP)`
"longest repeated substring"	Max LCP value
"longest common substring" of two strings	Concatenate + SA + LCP scan
"compare suffixes efficiently"	RA array + sparse table RMQ
"lexicographically smallest/largest rotation"	SA on `s + s`
"k-th lexicographically smallest substring"	Walk SA with LCP-gap counting
"shortest unique substring"	SA + LCP, find suffix with largest min-neighbor-LCP

Common Patterns

Pattern	When it applies	Mechanism
Separator concatenation	Query spans two strings	Join `s1 + chr(0) + s2`; separator sorts before all real chars, never matches
Source-check LCP scan	Find longest common substring	Walk sorted SA, track which original string each SA entry belongs to, record max LCP between adjacent different-source entries
Sparse table RMQ on LCP	Arbitrary suffix LCP needed	Precompute sparse table; `lcp(i,j) = min(LCP[RA[i]+1 .. RA[j]])`
Binary search on SA	Pattern matching	Binary search for leftmost and rightmost suffix with p as prefix
k-th substring via LCP gaps	Enumerate distinct substrings in order	SA[i] contributes `(n − SA[i]) − LCP[i]` new substrings; walk until cumulative count ≥ k
Sliding window over SA	Common substring across k strings	Window over sorted SA ensuring all k string sources are represented; shrink window when min LCP in range exceeds boundary

Edge Cases & Pitfalls

Separator character: must be strictly smaller than every character in the input alphabet. chr(0) is safe for any printable string; '#' fails if the input can contain '#'. Wrong separator causes cross-string matches to be counted as within-string repeats.
LCP[0] access: LCP[0] is undefined (no preceding suffix). All LCP scans must start at index 1; reading LCP[0] as a real value produces a spurious result.
Stale rank array: the working rank buffer used during prefix doubling reflects intermediate state, not the final sorted order. Rebuild RA from the final SA with a single O(n) pass before answering any rank-based query.
O(n log² n) TLE: using Python's sorted() in prefix doubling gives O(n log² n). For n > 10^5, implement counting sort per round or switch to SA-IS.
Empty string or single character: SA = [0], LCP = [0]. The distinct-substrings formula and max-LCP queries both return 0 — correct, but guard against max([]) errors in code.
All identical characters: LCP array values are n-1, n-2, …, 1, 0; sum is O(n²). The distinct-substrings formula still works (result = n), but intermediate sum(LCP) can overflow 32-bit integers for large n.
Two strings with same length: when checking "different source" in the separator-concatenation pattern, the separator occupies index len(s1), so an index belongs to s2 if it is > len(s1), not ≥ len(s1).
Rank vs. SA confusion: SA[rank] = position and RA[position] = rank. Swapping them in RMQ or binary search produces silently wrong results with no obvious runtime error.

Implementation Notes

Python has no built-in suffix array. For n ≤ 2×10^4, the O(n log² n) prefix-doubling with sorted() is acceptable. Above that, implement radix sort or use a C extension.
sys.stdout.reconfigure(encoding="utf-8") at the top of any script printing Unicode output (project standard).
Kasai's algorithm requires a fully finalized SA and a correctly derived RA. Passing a stale rank array from mid-construction is a common silent bug.
Sparse table for RMQ: precompute log2 as an integer array to avoid floating-point math.log calls inside the query loop.
SA indices are 0-based in almost all implementations; mixing 0-based and 1-based indexing between SA and LCP is the most common source of off-by-one bugs.
When concatenating strings with a separator, convert to a list of integers (ord values) rather than strings; this avoids Python string slicing overhead and makes the separator comparison explicit.

Cross-Topic Relations

Related topic	Connection
String Hashing / Rolling Hash	Alternative O(n log n) approach for single-pattern search; SA is strictly more powerful for multi-query scenarios
Trie	Suffix trie is the conceptual precursor; SA + LCP encodes an implicit suffix tree in O(n) space
Segment Tree / Sparse Table	RMQ on LCP array for O(1) arbitrary LCP queries; see segment-tree.md
Binary Search	Pattern matching on SA is a direct application; see binary-search.md
Z-Algorithm / KMP	Linear-time alternatives for single-pattern matching; SA generalizes to arbitrary substring queries and multiple patterns
Dynamic Programming	Some SA problems reduce to DP over the sorted suffix sequence (e.g., k-th distinct substring)
Sliding Window	Used in the k-string common-substring window scan over sorted SA

Interview Reference

Must-Know Problems

Longest Repeated / Distinct Substrings

Longest Duplicate Substring (LC 1044) — binary search + rolling hash is the standard approach; SA + max(LCP) is the direct construction
Number of Distinct Substrings (CSES 2402) — n*(n+1)/2 − sum(LCP)

Multi-String

Longest Common Substring of Two Strings (LC 718 is solved by DP, but the SA approach generalizes to k strings)
Longest Common Substring of k strings — concatenate with distinct separators, sliding window over SA maintaining coverage of all k sources

Pattern Matching

Find all occurrences of a pattern — binary search on SA, return SA[lo..hi-1]
Shortest Unique Substring — for each suffix, compute min(LCP with left neighbor, LCP with right neighbor) + 1

Lexicographic

Lexicographically Smallest Rotation (LC 1163) — SA on s + s, first entry with SA[i] < n
k-th Lexicographically Smallest Distinct Substring — walk SA with LCP-gap counting

Clarification Questions to Ask

What is the character set? Affects separator character choice and whether radix sort can be tuned.
Single query or multiple queries on the same string? Multiple queries justify O(n log n) preprocessing + O(1) or O(log n) per query.
Are the strings static? SA is a static structure; updates require full O(n log n) reconstruction.
What are the length constraints? n ≤ 10³: brute force fine; n ≤ 10^5: O(n log n) SA; n > 10^6: SA-IS required.
Does "substring" mean contiguous? Always verify — some problems intend "subsequence."

Common Interview Mistakes

Building SA but not the LCP array, then computing LCP by direct string comparison (O(n) per query instead of O(1) with sparse table).
Using '#' as a separator when the input alphabet is not guaranteed to exclude it; the separator must be provably outside the alphabet.
Confusing SA and RA: SA[i] gives the starting position of the rank-i suffix; RA[j] gives the rank of the suffix at position j. Swapping them produces silent wrong answers.
Off-by-one: LCP[i] compares SA[i-1] and SA[i]; valid for i ∈ [1, n-1]. Reading LCP[0] as a valid comparison crashes or returns garbage.
Not rebuilding RA after the final SA pass; the working rank buffer from prefix doubling may not equal the inverse of the final SA.

Typical Follow-Up Escalations

"Find all occurrences in O(m + log n) instead of O(m log n)" → use LCP-guided binary search to avoid re-scanning the matched prefix at each step.
"Now handle k strings" → concatenate all with distinct separators; sliding window over SA tracks how many distinct string sources appear in the current LCP-bounded window.
"Support online updates" → SA requires full reconstruction; suggest suffix automaton, which supports incremental O(n) amortized construction.
"What if the alphabet is huge (Unicode)?" → replace character-based radix sort with comparison sort or hash-based rank assignment; O(n log² n) becomes acceptable.

Explore Unread

Great job! You've read all available articles

Suffix Array

Type: Data Structure LeetCode problems: ~20–30 (< 50)

Core Concepts

Suffix — a substring starting at position i and ending at the last character; string s of length n has exactly n suffixes.

Rank Array (RA / ISA) — inverse of SA: RA[SA[i]] = i, so RA[j] is the sorted rank of the suffix starting at position j; enables O(1) suffix comparison.

Suffix Tree — a compressed trie of all suffixes that encodes the same information as SA + LCP but with higher constant factors. SA + LCP is preferred in practice.

See trie.md for trie background.

Suffix Automaton (SAM) — a DAG of states representing all substrings; supports online (incremental) construction unlike SA.

See suffix-automaton.md for full coverage.

Structural Invariants

Property	Invariant	What breaks if violated
Sorted order	`s[SA[0]:] < s[SA[1]:] < … < s[SA[n-1]:]` lexicographically	Binary search for pattern matching returns a wrong range
LCP consistency	`LCP[i]` equals the exact shared prefix length of `s[SA[i-1]:]` and `s[SA[i]:]`	All LCP-based queries return wrong answers
Rank is inverse of SA	`RA[SA[i]] = i` for all `i`	O(1) rank lookups are wrong; prefix doubling produces incorrect order
Completeness	Every index `0..n-1` appears exactly once in SA	Some suffixes uncounted; occurrence queries miss entries

Internal Representation

SA: integer array of length n storing starting indices. Storing suffixes explicitly would cost O(n²) space; SA uses O(n).

LCP array: integer array of length n (LCP[0] unused). Always built alongside SA using Kasai's algorithm.

Rank array: integer array of length n; maintained as a working buffer during construction, then rebuilt from the final SA by a single pass.

Sparse table on LCP: O(n log n) preprocessing for O(1) range minimum queries on LCP, needed for arbitrary lcp(i, j) queries.

Construction only retains SA and LCP as outputs; the rank array is a construction artifact rebuilt on demand.

Types / Variants

Variant	Time	Space	Use when
Prefix Doubling (Manber-Myers)	O(n log n) with radix sort	O(n)	Default choice; straightforward to implement and fast in practice
DC3 / Skew	O(n)	O(n)	Theoretical O(n) needed; complex, almost never used in interviews
SA-IS	O(n)	O(n)	Best practical O(n) implementation; used in `libdivsufsort` and CP libraries

Prefix doubling is the standard choice for interviews and most competitive programming.

Topic-Specific Operations

Arbitrary LCP Query lcp(i, j) = RMQ(LCP, RA[i]+1, RA[j]) (assuming RA[i] < RA[j]). Requires a sparse table on the LCP array. Time: O(1) per query after O(n log n) preprocessing.

Number of Distinct Substrings Total substrings = n*(n+1)/2. Each LCP[i] counts how many substrings of SA[i] are already represented by SA[i-1]. Formula: n*(n+1)/2 − sum(LCP[1..n-1]).

Key Algorithms

Prefix Doubling (Manber-Myers)

O(n log n) with radix sort; the invariant after round k is that rank[i] reflects the sorted order of s[i : i + 2^k]

Key line: sort by pair (rank[i], rank[i + gap] if i + gap < n else -1), then reassign integer ranks by comparing adjacent pairs.

Kasai's LCP Algorithm

O(n) time; run immediately after building SA

h = 0
for i in range(n):
    if ra[i] > 0:
        j = sa[ra[i] - 1]
        while i + h < n and j + h < n and s[i+h] == s[j+h]:
            h += 1
        lcp[ra[i]] = h
        if h: h -= 1

Longest Common Substring of Two Strings

O((n+m) log(n+m)) construction + O(n+m) scan

Longest Repeated Substring

Maximum value in the LCP array after construction. The substring is s[SA[i] : SA[i] + LCP[i]] at the maximizing index.

O(n) after SA + LCP construction

Problem Taxonomy

By Problem Category

Category	What it asks	Key technique
Pattern search	All occurrences of pattern p	Binary search on SA
Distinct substrings	Count unique substrings	`n*(n+1)/2 − sum(LCP)`
Longest repeated substring	Longest substring appearing ≥ 2 times	Max of LCP array
Longest common substring	Longest shared substring of 2+ strings	Concatenate with separators, scan LCP
Lexicographic k-th substring	Find the k-th distinct substring	Walk SA using `(n − SA[i]) − LCP[i]` new-substring counts
Suffix comparison	Compare two suffixes in O(1)	RA lookup; ties impossible since all suffixes are distinct
Shortest unique substring	Shortest substring appearing exactly once	SA + LCP; for each suffix find min gap to both neighbors

Problem Signal → Technique

Signal in problem statement	Likely technique
"all occurrences of pattern"	Binary search on SA
"number of distinct substrings"	`n*(n+1)/2 − sum(LCP)`
"longest repeated substring"	Max LCP value
"longest common substring" of two strings	Concatenate + SA + LCP scan
"compare suffixes efficiently"	RA array + sparse table RMQ
"lexicographically smallest/largest rotation"	SA on `s + s`
"k-th lexicographically smallest substring"	Walk SA with LCP-gap counting
"shortest unique substring"	SA + LCP, find suffix with largest min-neighbor-LCP

Common Patterns

Pattern	When it applies	Mechanism
Separator concatenation	Query spans two strings	Join `s1 + chr(0) + s2`; separator sorts before all real chars, never matches
Source-check LCP scan	Find longest common substring	Walk sorted SA, track which original string each SA entry belongs to, record max LCP between adjacent different-source entries
Sparse table RMQ on LCP	Arbitrary suffix LCP needed	Precompute sparse table; `lcp(i,j) = min(LCP[RA[i]+1 .. RA[j]])`
Binary search on SA	Pattern matching	Binary search for leftmost and rightmost suffix with p as prefix
k-th substring via LCP gaps	Enumerate distinct substrings in order	SA[i] contributes `(n − SA[i]) − LCP[i]` new substrings; walk until cumulative count ≥ k
Sliding window over SA	Common substring across k strings	Window over sorted SA ensuring all k string sources are represented; shrink window when min LCP in range exceeds boundary

Edge Cases & Pitfalls

Separator character: must be strictly smaller than every character in the input alphabet. chr(0) is safe for any printable string; '#' fails if the input can contain '#'. Wrong separator causes cross-string matches to be counted as within-string repeats.
LCP[0] access: LCP[0] is undefined (no preceding suffix). All LCP scans must start at index 1; reading LCP[0] as a real value produces a spurious result.
Stale rank array: the working rank buffer used during prefix doubling reflects intermediate state, not the final sorted order. Rebuild RA from the final SA with a single O(n) pass before answering any rank-based query.
O(n log² n) TLE: using Python's sorted() in prefix doubling gives O(n log² n). For n > 10^5, implement counting sort per round or switch to SA-IS.
Empty string or single character: SA = [0], LCP = [0]. The distinct-substrings formula and max-LCP queries both return 0 — correct, but guard against max([]) errors in code.
All identical characters: LCP array values are n-1, n-2, …, 1, 0; sum is O(n²). The distinct-substrings formula still works (result = n), but intermediate sum(LCP) can overflow 32-bit integers for large n.
Two strings with same length: when checking "different source" in the separator-concatenation pattern, the separator occupies index len(s1), so an index belongs to s2 if it is > len(s1), not ≥ len(s1).
Rank vs. SA confusion: SA[rank] = position and RA[position] = rank. Swapping them in RMQ or binary search produces silently wrong results with no obvious runtime error.

Implementation Notes

Python has no built-in suffix array. For n ≤ 2×10^4, the O(n log² n) prefix-doubling with sorted() is acceptable. Above that, implement radix sort or use a C extension.
sys.stdout.reconfigure(encoding="utf-8") at the top of any script printing Unicode output (project standard).
Kasai's algorithm requires a fully finalized SA and a correctly derived RA. Passing a stale rank array from mid-construction is a common silent bug.
Sparse table for RMQ: precompute log2 as an integer array to avoid floating-point math.log calls inside the query loop.
SA indices are 0-based in almost all implementations; mixing 0-based and 1-based indexing between SA and LCP is the most common source of off-by-one bugs.
When concatenating strings with a separator, convert to a list of integers (ord values) rather than strings; this avoids Python string slicing overhead and makes the separator comparison explicit.

Cross-Topic Relations

Related topic	Connection
String Hashing / Rolling Hash	Alternative O(n log n) approach for single-pattern search; SA is strictly more powerful for multi-query scenarios
Trie	Suffix trie is the conceptual precursor; SA + LCP encodes an implicit suffix tree in O(n) space
Segment Tree / Sparse Table	RMQ on LCP array for O(1) arbitrary LCP queries; see segment-tree.md
Binary Search	Pattern matching on SA is a direct application; see binary-search.md
Z-Algorithm / KMP	Linear-time alternatives for single-pattern matching; SA generalizes to arbitrary substring queries and multiple patterns
Dynamic Programming	Some SA problems reduce to DP over the sorted suffix sequence (e.g., k-th distinct substring)
Sliding Window	Used in the k-string common-substring window scan over sorted SA

Interview Reference

Must-Know Problems

Longest Repeated / Distinct Substrings

Longest Duplicate Substring (LC 1044) — binary search + rolling hash is the standard approach; SA + max(LCP) is the direct construction
Number of Distinct Substrings (CSES 2402) — n*(n+1)/2 − sum(LCP)

Multi-String

Longest Common Substring of Two Strings (LC 718 is solved by DP, but the SA approach generalizes to k strings)
Longest Common Substring of k strings — concatenate with distinct separators, sliding window over SA maintaining coverage of all k sources

Pattern Matching

Find all occurrences of a pattern — binary search on SA, return SA[lo..hi-1]
Shortest Unique Substring — for each suffix, compute min(LCP with left neighbor, LCP with right neighbor) + 1

Lexicographic

Lexicographically Smallest Rotation (LC 1163) — SA on s + s, first entry with SA[i] < n
k-th Lexicographically Smallest Distinct Substring — walk SA with LCP-gap counting

Clarification Questions to Ask

What is the character set? Affects separator character choice and whether radix sort can be tuned.
Single query or multiple queries on the same string? Multiple queries justify O(n log n) preprocessing + O(1) or O(log n) per query.
Are the strings static? SA is a static structure; updates require full O(n log n) reconstruction.
What are the length constraints? n ≤ 10³: brute force fine; n ≤ 10^5: O(n log n) SA; n > 10^6: SA-IS required.
Does "substring" mean contiguous? Always verify — some problems intend "subsequence."

Common Interview Mistakes

Building SA but not the LCP array, then computing LCP by direct string comparison (O(n) per query instead of O(1) with sparse table).
Using '#' as a separator when the input alphabet is not guaranteed to exclude it; the separator must be provably outside the alphabet.
Confusing SA and RA: SA[i] gives the starting position of the rank-i suffix; RA[j] gives the rank of the suffix at position j. Swapping them produces silent wrong answers.
Off-by-one: LCP[i] compares SA[i-1] and SA[i]; valid for i ∈ [1, n-1]. Reading LCP[0] as a valid comparison crashes or returns garbage.
Not rebuilding RA after the final SA pass; the working rank buffer from prefix doubling may not equal the inverse of the final SA.

Typical Follow-Up Escalations

"Find all occurrences in O(m + log n) instead of O(m log n)" → use LCP-guided binary search to avoid re-scanning the matched prefix at each step.
"Now handle k strings" → concatenate all with distinct separators; sliding window over SA tracks how many distinct string sources appear in the current LCP-bounded window.
"Support online updates" → SA requires full reconstruction; suggest suffix automaton, which supports incremental O(n) amortized construction.
"What if the alphabet is huge (Unicode)?" → replace character-based radix sort with comparison sort or hash-based rank assignment; O(n log² n) becomes acceptable.

Explore Unread

Great job! You've read all available articles

Suffix Array

Core Concepts

Structural Invariants

Internal Representation

Types / Variants

Topic-Specific Operations

Key Algorithms

Prefix Doubling (Manber-Myers)

Kasai's LCP Algorithm

Longest Common Substring of Two Strings

Longest Repeated Substring

Problem Taxonomy

By Problem Category

Problem Signal → Technique

Common Patterns

Edge Cases & Pitfalls

Implementation Notes

Cross-Topic Relations

Interview Reference

Must-Know Problems

Clarification Questions to Ask

Common Interview Mistakes

Typical Follow-Up Escalations

Read Next

Explore Unread

Suffix Array

Core Concepts

Structural Invariants

Internal Representation

Types / Variants

Topic-Specific Operations

Key Algorithms

Prefix Doubling (Manber-Myers)

Kasai's LCP Algorithm

Longest Common Substring of Two Strings

Longest Repeated Substring

Problem Taxonomy

By Problem Category

Problem Signal → Technique

Common Patterns

Edge Cases & Pitfalls

Implementation Notes

Cross-Topic Relations

Interview Reference

Must-Know Problems

Clarification Questions to Ask

Common Interview Mistakes

Typical Follow-Up Escalations

Read Next

Explore Unread