Rolling Hash

Rolling Hash is a technique for comparing fixed-length substrings in O(1) per step after O(n) preprocessing, by maintaining a hash value that updates incrementally as a window slides across a string or array. It converts exact substring comparison — naively O(m) per position — into an O(1) hash check with O(m) verification only on collision.

Core Concepts

Polynomial rolling hash — a hash of a string S[0..m−1] computed as S[0]·b^(m−1) + S[1]·b^(m−2) + … + S[m−1]·b^0 mod p, where b is a base larger than the alphabet size and p is a large prime. The polynomial structure enables O(1) removal of the leading character and addition of a trailing character.

Rolling update (slide by one) — given H = hash(S[i..i+m−1]), the hash of S[i+1..i+m] is (H − S[i] · b^(m−1)) · b + S[i+m], all mod p. The subtraction removes the outgoing character's contribution; multiplying by b shifts the remaining terms left; adding the new character appends it at degree 0.

Prefix hash array — an array H[0..n] where H[i] = S[0]·b^(i−1) + … + S[i−1]·b^0 mod p (with H[0] = 0). Enables O(1) hash of any substring S[l..r] as (H[r+1] − H[l] · b^(r−l+1)) mod p. Avoids recomputing from scratch for offline multi-length queries.

Hash collision — two distinct substrings producing the same hash value. Probability ≈ 1/p per comparison with a single modulus; with double hashing (two independent (base, mod) pairs) it drops to ≈ 1/(p₁·p₂).

Spurious match (false positive) — a window whose hash equals the pattern hash but whose content differs. Always verify with direct comparison when correctness is critical; count the expected number of spurious matches as n/p.

Base — must exceed the alphabet size: ≥ 27 for lowercase letters, ≥ 131 for general ASCII. Common choices: 31 (lowercase), 131 (ASCII).

Modulus — a large prime to minimise collisions. Common choices: 10⁹+7, 10⁹+9, 2³¹−1. Using a power-of-two modulus enables bitwise operations but increases collision risk with adversarial inputs.

Double hashing — maintaining two independent hash values (H₁ mod p₁, H₂ mod p₂) per window. A match requires both to agree, reducing false-positive rate to ≈ 1/(p₁·p₂) ≈ 10⁻¹⁸ for typical prime choices.

Power table — precomputed array pw[i] = b^i mod p. Required to evaluate b^(m−1) mod p in O(1) during the roll and to compute b^(r−l+1) for prefix hash queries.

Types / Variants

Variant	What it is	When to use	Key property
Single hash (one base, one mod)	One polynomial hash per window	Default; fast, low implementation cost	False-positive rate ≈ 1/p; verify on match
Double hash	Two independent (base, mod) pairs maintained simultaneously	When false positives are unacceptable and verification is expensive	Rate ≈ 1/(p₁·p₂); virtually eliminates collisions in practice
Prefix hash array	Precomputed cumulative hash for offline substring queries	Multiple length queries on the same string; no sliding needed	O(n) build, O(1) per query of any substring
Fixed-window slide	Maintains one hash of a length-m window as it slides	Single pattern length, online left-to-right sweep	O(n) total; constant space beyond the power table
2D rolling hash	Hash computed over rows then columns (or vice versa)	2D pattern matching in a matrix	Compose two 1D hashes; collision rate is product of rates
Polynomial hash with negative values	Apply offset so all characters map to positive integers	Numeric arrays or mixed-sign inputs	Offset = abs(min_value) + 1; same formula applies

Key Algorithms

Rabin-Karp Substring Search

Slides a length-m hash window across text T, comparing against the precomputed hash of pattern P. On hash match, verify the actual substring. O(n + m) average time (one hash computation + n slides), O(nm) worst case (all collisions with no verification short-circuit), O(1) extra space beyond the power table.

Key insight: the rolling update allows the hash to advance one character in O(1), making each position O(1) amortised instead of O(m).

h = pw = 0
for c in P: h = (h * BASE + ord(c)) % MOD; pw = pw * BASE % MOD  # pw = BASE^m % MOD... (split: pw starts at 1, multiply m times)

After computing H_P and the initial H_W over T[0..m−1], the slide step is:

H_W = (H_W - ord(T[i]) * pw % MOD + MOD) * BASE % MOD + ord(T[i + m])
H_W %= MOD

The + MOD before the multiplication prevents negative intermediate results in languages without Python's arbitrary-precision integers.

Prefix Hash Array Build and Query

Precompute cumulative hashes so any substring hash is retrievable in O(1). O(n) build, O(1) per query, O(n) space.

H[0] = 0
for i, c in enumerate(S): H[i+1] = (H[i] * BASE + ord(c)) % MOD

Query hash of S[l..r] (0-indexed, inclusive):

hash_lr = (H[r+1] - H[l] * pw[r - l + 1]) % MOD

The pw array satisfies pw[k] = BASE^k % MOD. Negative results from subtraction are handled by Python's % automatically; in other languages add MOD before taking mod.

Double Hashing

Maintain two prefix hash arrays (or two rolling hashes) with different (BASE, MOD) pairs. A substring match requires both hashes to agree. No additional algorithmic steps — it is the same rolling or prefix structure applied twice in parallel. O(n) build, O(1) query, O(n) space (two arrays).

When to use over single hashing: when the input is adversarial (e.g., a judge with anti-hash tests), when the problem has large n (n ≥ 10⁵) and the expected number of single-hash collisions n/MOD is non-negligible, or when verification by direct comparison would cost O(m) and degrade worst-case performance.

Binary Search + Rolling Hash (Longest Duplicate Substring)

Binary search on the answer length L; for each candidate L, slide a length-L hash window over the string and check membership in a set. O(n log n) time, O(n) space.

Key insight: "does a substring of length L appear twice?" is decidable in O(n) with a hash set — binary search on L turns it into O(n log n). The same skeleton solves longest common substring of two strings (concatenate with a sentinel, track which string each hash came from).

lo, hi = 1, len(S) - 1
while lo <= hi:
    mid = (lo + hi) // 2
    if has_duplicate(S, mid): lo = mid + 1; best = mid
    else: hi = mid - 1

The has_duplicate function runs a single O(n) hash-window pass. Use double hashing or verify candidates to avoid false positives causing incorrect binary search decisions.

Fixed-Window Anagram / Permutation Check

Slide a length-m window over T; maintain a rolling hash of the window's sorted characters — or, equivalently, use a character frequency vector instead of a positional hash. The frequency vector approach is O(1) per step with 26-element comparison. Rolling hash over sorted content requires re-sorting each step (not O(1)); the canonical solution uses frequency vectors.

Note: positional rolling hash is not suitable for order-insensitive matching (anagrams). Use frequency vectors or a frequency-based hash (sum of per-character hashes, order-independent). See sliding-window.md for the frequency-vector pattern.

2D Rolling Hash

Hash each row independently to produce a 1D array of row-hashes, then apply a second rolling hash over columns of that array. Enables O(1) hash of any submatrix after O(mn) preprocessing.

Mechanism: for a pattern of height r and width c, first hash every horizontal stripe of height r to a single value per column position, then slide a width-c window over those column values. Two independent hash dimensions keep the collision rate low.

Advanced Techniques

Binary Search + Rolling Hash for Longest Repeated / Common Substring

Solves: longest substring appearing ≥ k times; longest common substring of two or more strings; longest substring shared between two halves. Mechanism: binary search on length L; O(n) sliding hash pass per candidate stores all length-L hashes in a set (or checks for a repeat). Concatenate strings with a sentinel character (absent from both) for multi-string variants; filter set entries by source string. When to prefer over suffix array: when the problem only needs the length (not all positions), when implementation time is constrained, or when n ≤ 10⁶ makes the O(n log n) solution fast enough. Suffix arrays give all positions and support more complex queries; rolling hash binary search gives only the length/existence. Complexity: O(n log n) expected with O(1) verification probability; O(n log n · m) in the worst case if every candidate requires O(m) verification.

Hashing with Multiple Lengths (Multi-Pattern Rabin-Karp)

Solves: given k patterns of possibly different lengths, find all occurrences of any pattern in text T. Mechanism: group patterns by length; for each length group, compute a hash set of that group's hashes, then slide a window of that length over T and check membership. One pass per distinct pattern length. When to prefer over Aho-Corasick: when the number of distinct lengths is small (≤ log n), or when the patterns are given online and Aho-Corasick preprocessing is not feasible. For k patterns all of the same length, this is strictly O(n + km) vs. Aho-Corasick's O(Σ|patterns| + n); rolling hash wins when k is large and verification is rare. Complexity: O(n · distinct_lengths + total_pattern_length) expected.

See string-matching.md for Aho-Corasick and full multi-pattern coverage.

Problem Taxonomy

By Problem Category

Category	What it asks	Key technique
Substring search	Does pattern P appear in text T?	Rabin-Karp single-pass hash; verify on match
Duplicate substring detection	Does any substring of length L appear twice?	Fixed-window hash set membership check
Longest duplicate substring	Maximum length L with a repeated substring	Binary search on L + rolling hash
Longest common substring	Longest substring shared by two strings	Concatenate with sentinel + binary search + hash
Repeated DNA / fixed-length repeats	All substrings of length k appearing ≥ 2 times	Hash set; collect hashes seen twice
Equal / matching windows	Are two windows of the same length identical?	Compare prefix hashes in O(1)
2D pattern in matrix	Does a 2D pattern appear in a 2D grid?	2D rolling hash (row hash then column hash)
Minimum window containing pattern	Smallest window whose hash matches	Rabin-Karp with multi-length scan

Problem Signal → Technique

Signal in problem	Likely technique
"find duplicate substrings of length k"	Rolling hash set over length-k windows
"longest substring appearing at least twice"	Binary search on length + rolling hash
"longest common substring of two strings"	Concatenate with sentinel + binary search + hash
"check if two substrings are equal" (multiple queries)	Prefix hash array, O(1) per query
"rolling hash" or "Rabin-Karp" mentioned explicitly	Polynomial rolling hash with slide update
"repeated DNA sequences" / fixed-length repeated window	Hash set on fixed-length windows
"string matching in O(n)"	Rabin-Karp (or KMP — check if pattern length matters)
"2D pattern in grid"	2D rolling hash
multiple substring equality queries on the same string	Precompute prefix hash array; answer each in O(1)

Common Patterns

Pattern	When it applies	Mechanism
Precompute power table	Any prefix hash array usage	`pw[0]=1; pw[i]=pw[i-1]*BASE%MOD`; size n+1
Sentinel-padded prefix hash	Offline multi-query substring comparison	`H[0]=0; H[i+1]=(H[i]*BASE+ord(S[i]))%MOD`; query via subtraction
Double hash maintenance	Adversarial inputs or critical correctness	Carry two `(H1, H2)` tuples through every operation; match only when both agree
Hash set for window membership	Detecting any repeat at a fixed length	Insert each window hash; check before inserting; `seen = set()`
Rolling slide with `+MOD` guard	Single-modulus subtraction	`H = (H - outgoing_contrib + MOD) * BASE % MOD + incoming`
Binary search shell	Longest / minimum length satisfying a hash-checkable property	Binary search on L; call O(n) `has_property(L)` each iteration
Per-character hash for order-independent windows	Anagram hashing (not positional)	Assign each character a random value; window hash = XOR or sum of character hashes

Edge Cases & Pitfalls

Negative subtraction result in rolling update. When H - outgoing_contrib goes negative (in fixed-width integer languages), taking mod gives a negative result. The fix is (H - outgoing_contrib % MOD + MOD) % MOD. Python's % is always non-negative, so this is only a concern in Java/C++, but the +MOD guard is a good habit regardless.
Power table off-by-one. The rolling subtraction requires BASE^m mod MOD (the power of the outgoing position, which is m characters back). A common mistake is using BASE^(m−1). Derive the formula from scratch: if H = S[i]·B^(m−1) + … + S[i+m−1]·B^0, then subtracting S[i]·B^(m−1) and multiplying by B gives S[i+1]·B^(m−1) + … + S[i+m−1]·B^1.
Base too small for the alphabet. Using BASE = 26 for lowercase letters makes S[0]·26^k indistinguishable from S[0]+1·26^k when carries wrap around mod p. Always choose BASE > max(ord(c)) - ord('a'): 31 for lowercase, 131 for ASCII.
Forgetting the sentinel in multi-string concatenation. When concatenating S1 + '$' + S2 to find common substrings, the sentinel must not appear in either string. A window spanning the boundary would produce a spurious match if no sentinel is present. Use a character outside the input alphabet.
Binary search boundary on length. Searching for the longest duplicate substring: lo = 1, hi = len(S) - 1 (not len(S)). A string of length n cannot have a duplicate substring of length n (it would require the string to appear in itself at a different offset, which is only possible for periodic strings). Setting hi = len(S) causes the loop to check an impossible length, wasting one O(n) call.
False positive causing wrong binary search decision. If has_duplicate(L) returns True due to a hash collision when no real duplicate exists, the binary search moves lo right and may report a longer length than actually exists. Double hashing or verifying the candidate on match eliminates this.
Collision probability grows with n. With a single modulus of 10⁹+7 and n = 10⁵ windows, expected false positives ≈ 10⁵ / 10⁹ ≈ 0.0001 per run. For n = 10⁶ it's ≈ 0.001. Still small, but anti-hash test cases on judges can exploit a fixed (base, mod) pair — use random bases or double hashing for robustness.
Applying rolling hash to order-insensitive windows. Polynomial rolling hash is positional; hash("ab") ≠ hash("ba"). For anagram detection, use a frequency-based hash (XOR or sum of random per-character values) or a frequency vector. Do not use positional rolling hash for anagram problems.
Prefix hash query with wrong power index. The query hash(S[l..r]) = H[r+1] - H[l] * BASE^(r-l+1) uses pw[r-l+1], not pw[r-l]. The length of the substring is r - l + 1; that is the power needed to shift H[l] to align with H[r+1].

Implementation Notes

Python's arbitrary-precision integers mean subtraction never underflows, but intermediate values can grow before the % MOD is applied. Always reduce modulo at each step to avoid large intermediate products: (a * b) % MOD instead of computing a * b first for large a.
pow(BASE, k, MOD) in Python computes BASE^k mod MOD in O(log k) using fast modular exponentiation — use this to initialise the power table or compute a single power rather than looping.
Precompute pw = [1] * (n + 1) with pw[i] = pw[i-1] * BASE % MOD once per string; do not recompute inside the sliding loop.
For the prefix hash array pattern, allocate H of size len(S) + 1 to avoid index-shifting in the query formula.
Choosing a random base at runtime (e.g., BASE = random.randint(131, 10**6)) defeats fixed anti-hash tests on competitive programming judges without requiring double hashing.
Python's set stores integers efficiently; hashing 64-bit integers into a set of size n uses approximately 8n bytes — feasible for n ≤ 10⁶.
For double hashing, store tuples (h1, h2) as the set key; Python hashes tuples correctly and the collision rate is effectively zero.

Cross-Topic Relations

Related topic	Connection
string-matching.md	Rabin-Karp is the rolling-hash-based string matching algorithm; KMP and Z-algorithm are deterministic alternatives without hash collisions
sliding-window.md	Rolling hash is a specialised sliding window where the "aggregate" is a hash value; anagram detection uses a frequency-vector sliding window instead
binary-search.md	Binary search on answer length is the primary hard-problem technique combining with rolling hash
prefix-sum.md	Prefix hash array mirrors the prefix sum construction: same sentinel-padded build, same subtraction-based range query
suffix-array.md	Suffix arrays solve the same repeated/common substring problems deterministically in O(n log n); rolling hash binary search is the probabilistic alternative with simpler implementation
hash-table.md	Rolling hash produces the keys stored in hash sets; understanding modular arithmetic and collision probability requires hash table foundations

Interview Reference

Must-Know Problems

Repeated / duplicate substrings

Repeated DNA Sequences (LC 187) — hash set on length-10 windows; canonical rolling hash warm-up
Longest Duplicate Substring (LC 1044) — binary search + rolling hash; high-frequency hard problem
Longest Repeating Substring (LC 1062) — same skeleton as 1044, premium variant

Substring equality queries

Longest Common Subpath (LC 1923) — binary search + hash; multiple arrays
String Hashing / rolling hash template problems (various OJ problems explicitly testing the technique)

Pattern search / matching

Find the Index of the First Occurrence in a String (LC 28) — implement Rabin-Karp as an alternative to KMP
Implement strStr() variants asking for O(n) expected time

Multi-string / 2D

Equal Rational Numbers (LC 972) — hash-based string normalization
Longest Common Substring of Two Strings (not a standalone LeetCode problem but appears as a sub-problem in many hards)

Clarification Questions to Ask

Is the input alphabet bounded (lowercase letters, ASCII, arbitrary Unicode)? Determines the base choice.
Are there multiple queries on the same string, or a single scan? Multiple queries → prefix hash array; single scan → sliding window.
Is the answer a length, a count, or actual positions? Positions require storing indices alongside hashes.
Does "substring" mean contiguous? (Rolling hash only applies to contiguous windows.)
What is n? For n ≥ 10⁶, double hashing or random base selection is advisable to avoid anti-hash tests.
Is an exact answer required, or is probabilistic correctness acceptable? (Affects whether verification is needed.)

Common Interview Mistakes

Forgetting to add MOD before the subtraction step in the rolling update, producing negative hash values and wrong membership checks in non-Python languages.
Using a fixed (base, mod) pair from a textbook example, which is exploitable by anti-hash test cases on competitive programming judges — randomise the base or use two moduli.
Applying positional rolling hash to anagram / permutation problems, where character order should not matter — the correct structure is a frequency vector or order-independent hash.
Setting the binary search upper bound to len(S) instead of len(S) - 1, causing one unnecessary O(n) call and potential edge-case bugs.
Not verifying on hash match in problems where a single false positive changes the answer — the verification step is O(m) in the worst case but O(1) amortised across the full scan.
Computing BASE^(m-1) instead of BASE^m for the outgoing-character subtraction, shifting all subsequent hashes by a factor of BASE and causing every comparison to fail silently.

Typical Follow-Up Escalations

"Now find all positions of the duplicate substring, not just the length." → Store {hash: [start_indices]} in the hash set; collect all positions with the winning hash.
"Now handle up to 1000 pattern queries on the same text." → Build a prefix hash array once; answer each query in O(m) (hash the pattern, compare against the prefix array window).
"What if the text arrives as a stream, one character at a time?" → Use the sliding rolling hash (not the prefix array); maintain the current window hash and the outgoing-character power incrementally.
"Prove your solution is correct, not just probably correct." → Switch to Z-algorithm or KMP for deterministic O(n + m); rolling hash is a probabilistic technique.
"Now find the longest substring appearing in all k strings." → Binary search on L; for each L build a hash set per string; intersect the sets; O(nk log n) total.
"Can you reduce space to O(1)?" → Not possible with the prefix hash array; the sliding window approach uses O(1) space beyond the power value, but loses the ability to answer arbitrary substring queries.

Explore Unread

Great job! You've read all available articles

Rolling Hash

Core Concepts

Base — must exceed the alphabet size: ≥ 27 for lowercase letters, ≥ 131 for general ASCII. Common choices: 31 (lowercase), 131 (ASCII).

Power table — precomputed array pw[i] = b^i mod p. Required to evaluate b^(m−1) mod p in O(1) during the roll and to compute b^(r−l+1) for prefix hash queries.

Types / Variants

Variant	What it is	When to use	Key property
Single hash (one base, one mod)	One polynomial hash per window	Default; fast, low implementation cost	False-positive rate ≈ 1/p; verify on match
Double hash	Two independent (base, mod) pairs maintained simultaneously	When false positives are unacceptable and verification is expensive	Rate ≈ 1/(p₁·p₂); virtually eliminates collisions in practice
Prefix hash array	Precomputed cumulative hash for offline substring queries	Multiple length queries on the same string; no sliding needed	O(n) build, O(1) per query of any substring
Fixed-window slide	Maintains one hash of a length-m window as it slides	Single pattern length, online left-to-right sweep	O(n) total; constant space beyond the power table
2D rolling hash	Hash computed over rows then columns (or vice versa)	2D pattern matching in a matrix	Compose two 1D hashes; collision rate is product of rates
Polynomial hash with negative values	Apply offset so all characters map to positive integers	Numeric arrays or mixed-sign inputs	Offset = abs(min_value) + 1; same formula applies

Key Algorithms

Rabin-Karp Substring Search

Key insight: the rolling update allows the hash to advance one character in O(1), making each position O(1) amortised instead of O(m).

h = pw = 0
for c in P: h = (h * BASE + ord(c)) % MOD; pw = pw * BASE % MOD  # pw = BASE^m % MOD... (split: pw starts at 1, multiply m times)

After computing H_P and the initial H_W over T[0..m−1], the slide step is:

H_W = (H_W - ord(T[i]) * pw % MOD + MOD) * BASE % MOD + ord(T[i + m])
H_W %= MOD

The + MOD before the multiplication prevents negative intermediate results in languages without Python's arbitrary-precision integers.

Prefix Hash Array Build and Query

Precompute cumulative hashes so any substring hash is retrievable in O(1). O(n) build, O(1) per query, O(n) space.

H[0] = 0
for i, c in enumerate(S): H[i+1] = (H[i] * BASE + ord(c)) % MOD

Query hash of S[l..r] (0-indexed, inclusive):

hash_lr = (H[r+1] - H[l] * pw[r - l + 1]) % MOD

The pw array satisfies pw[k] = BASE^k % MOD. Negative results from subtraction are handled by Python's % automatically; in other languages add MOD before taking mod.

Double Hashing

Binary Search + Rolling Hash (Longest Duplicate Substring)

Binary search on the answer length L; for each candidate L, slide a length-L hash window over the string and check membership in a set. O(n log n) time, O(n) space.

lo, hi = 1, len(S) - 1
while lo <= hi:
    mid = (lo + hi) // 2
    if has_duplicate(S, mid): lo = mid + 1; best = mid
    else: hi = mid - 1

The has_duplicate function runs a single O(n) hash-window pass. Use double hashing or verify candidates to avoid false positives causing incorrect binary search decisions.

Fixed-Window Anagram / Permutation Check

Note: positional rolling hash is not suitable for order-insensitive matching (anagrams). Use frequency vectors or a frequency-based hash (sum of per-character hashes, order-independent). See sliding-window.md for the frequency-vector pattern.

2D Rolling Hash

Hash each row independently to produce a 1D array of row-hashes, then apply a second rolling hash over columns of that array. Enables O(1) hash of any submatrix after O(mn) preprocessing.

Advanced Techniques

Binary Search + Rolling Hash for Longest Repeated / Common Substring

Hashing with Multiple Lengths (Multi-Pattern Rabin-Karp)

See string-matching.md for Aho-Corasick and full multi-pattern coverage.

Problem Taxonomy

By Problem Category

Category	What it asks	Key technique
Substring search	Does pattern P appear in text T?	Rabin-Karp single-pass hash; verify on match
Duplicate substring detection	Does any substring of length L appear twice?	Fixed-window hash set membership check
Longest duplicate substring	Maximum length L with a repeated substring	Binary search on L + rolling hash
Longest common substring	Longest substring shared by two strings	Concatenate with sentinel + binary search + hash
Repeated DNA / fixed-length repeats	All substrings of length k appearing ≥ 2 times	Hash set; collect hashes seen twice
Equal / matching windows	Are two windows of the same length identical?	Compare prefix hashes in O(1)
2D pattern in matrix	Does a 2D pattern appear in a 2D grid?	2D rolling hash (row hash then column hash)
Minimum window containing pattern	Smallest window whose hash matches	Rabin-Karp with multi-length scan

Problem Signal → Technique

Signal in problem	Likely technique
"find duplicate substrings of length k"	Rolling hash set over length-k windows
"longest substring appearing at least twice"	Binary search on length + rolling hash
"longest common substring of two strings"	Concatenate with sentinel + binary search + hash
"check if two substrings are equal" (multiple queries)	Prefix hash array, O(1) per query
"rolling hash" or "Rabin-Karp" mentioned explicitly	Polynomial rolling hash with slide update
"repeated DNA sequences" / fixed-length repeated window	Hash set on fixed-length windows
"string matching in O(n)"	Rabin-Karp (or KMP — check if pattern length matters)
"2D pattern in grid"	2D rolling hash
multiple substring equality queries on the same string	Precompute prefix hash array; answer each in O(1)

Common Patterns

Pattern	When it applies	Mechanism
Precompute power table	Any prefix hash array usage	`pw[0]=1; pw[i]=pw[i-1]*BASE%MOD`; size n+1
Sentinel-padded prefix hash	Offline multi-query substring comparison	`H[0]=0; H[i+1]=(H[i]*BASE+ord(S[i]))%MOD`; query via subtraction
Double hash maintenance	Adversarial inputs or critical correctness	Carry two `(H1, H2)` tuples through every operation; match only when both agree
Hash set for window membership	Detecting any repeat at a fixed length	Insert each window hash; check before inserting; `seen = set()`
Rolling slide with `+MOD` guard	Single-modulus subtraction	`H = (H - outgoing_contrib + MOD) * BASE % MOD + incoming`
Binary search shell	Longest / minimum length satisfying a hash-checkable property	Binary search on L; call O(n) `has_property(L)` each iteration
Per-character hash for order-independent windows	Anagram hashing (not positional)	Assign each character a random value; window hash = XOR or sum of character hashes

Edge Cases & Pitfalls

Negative subtraction result in rolling update. When H - outgoing_contrib goes negative (in fixed-width integer languages), taking mod gives a negative result. The fix is (H - outgoing_contrib % MOD + MOD) % MOD. Python's % is always non-negative, so this is only a concern in Java/C++, but the +MOD guard is a good habit regardless.
Power table off-by-one. The rolling subtraction requires BASE^m mod MOD (the power of the outgoing position, which is m characters back). A common mistake is using BASE^(m−1). Derive the formula from scratch: if H = S[i]·B^(m−1) + … + S[i+m−1]·B^0, then subtracting S[i]·B^(m−1) and multiplying by B gives S[i+1]·B^(m−1) + … + S[i+m−1]·B^1.
Base too small for the alphabet. Using BASE = 26 for lowercase letters makes S[0]·26^k indistinguishable from S[0]+1·26^k when carries wrap around mod p. Always choose BASE > max(ord(c)) - ord('a'): 31 for lowercase, 131 for ASCII.
Forgetting the sentinel in multi-string concatenation. When concatenating S1 + '$' + S2 to find common substrings, the sentinel must not appear in either string. A window spanning the boundary would produce a spurious match if no sentinel is present. Use a character outside the input alphabet.
Binary search boundary on length. Searching for the longest duplicate substring: lo = 1, hi = len(S) - 1 (not len(S)). A string of length n cannot have a duplicate substring of length n (it would require the string to appear in itself at a different offset, which is only possible for periodic strings). Setting hi = len(S) causes the loop to check an impossible length, wasting one O(n) call.
False positive causing wrong binary search decision. If has_duplicate(L) returns True due to a hash collision when no real duplicate exists, the binary search moves lo right and may report a longer length than actually exists. Double hashing or verifying the candidate on match eliminates this.
Collision probability grows with n. With a single modulus of 10⁹+7 and n = 10⁵ windows, expected false positives ≈ 10⁵ / 10⁹ ≈ 0.0001 per run. For n = 10⁶ it's ≈ 0.001. Still small, but anti-hash test cases on judges can exploit a fixed (base, mod) pair — use random bases or double hashing for robustness.
Applying rolling hash to order-insensitive windows. Polynomial rolling hash is positional; hash("ab") ≠ hash("ba"). For anagram detection, use a frequency-based hash (XOR or sum of random per-character values) or a frequency vector. Do not use positional rolling hash for anagram problems.
Prefix hash query with wrong power index. The query hash(S[l..r]) = H[r+1] - H[l] * BASE^(r-l+1) uses pw[r-l+1], not pw[r-l]. The length of the substring is r - l + 1; that is the power needed to shift H[l] to align with H[r+1].

Implementation Notes

Python's arbitrary-precision integers mean subtraction never underflows, but intermediate values can grow before the % MOD is applied. Always reduce modulo at each step to avoid large intermediate products: (a * b) % MOD instead of computing a * b first for large a.
pow(BASE, k, MOD) in Python computes BASE^k mod MOD in O(log k) using fast modular exponentiation — use this to initialise the power table or compute a single power rather than looping.
Precompute pw = [1] * (n + 1) with pw[i] = pw[i-1] * BASE % MOD once per string; do not recompute inside the sliding loop.
For the prefix hash array pattern, allocate H of size len(S) + 1 to avoid index-shifting in the query formula.
Choosing a random base at runtime (e.g., BASE = random.randint(131, 10**6)) defeats fixed anti-hash tests on competitive programming judges without requiring double hashing.
Python's set stores integers efficiently; hashing 64-bit integers into a set of size n uses approximately 8n bytes — feasible for n ≤ 10⁶.
For double hashing, store tuples (h1, h2) as the set key; Python hashes tuples correctly and the collision rate is effectively zero.

Cross-Topic Relations

Related topic	Connection
string-matching.md	Rabin-Karp is the rolling-hash-based string matching algorithm; KMP and Z-algorithm are deterministic alternatives without hash collisions
sliding-window.md	Rolling hash is a specialised sliding window where the "aggregate" is a hash value; anagram detection uses a frequency-vector sliding window instead
binary-search.md	Binary search on answer length is the primary hard-problem technique combining with rolling hash
prefix-sum.md	Prefix hash array mirrors the prefix sum construction: same sentinel-padded build, same subtraction-based range query
suffix-array.md	Suffix arrays solve the same repeated/common substring problems deterministically in O(n log n); rolling hash binary search is the probabilistic alternative with simpler implementation
hash-table.md	Rolling hash produces the keys stored in hash sets; understanding modular arithmetic and collision probability requires hash table foundations

Interview Reference

Must-Know Problems

Repeated / duplicate substrings

Repeated DNA Sequences (LC 187) — hash set on length-10 windows; canonical rolling hash warm-up
Longest Duplicate Substring (LC 1044) — binary search + rolling hash; high-frequency hard problem
Longest Repeating Substring (LC 1062) — same skeleton as 1044, premium variant

Substring equality queries

Longest Common Subpath (LC 1923) — binary search + hash; multiple arrays
String Hashing / rolling hash template problems (various OJ problems explicitly testing the technique)

Pattern search / matching

Find the Index of the First Occurrence in a String (LC 28) — implement Rabin-Karp as an alternative to KMP
Implement strStr() variants asking for O(n) expected time

Multi-string / 2D

Equal Rational Numbers (LC 972) — hash-based string normalization
Longest Common Substring of Two Strings (not a standalone LeetCode problem but appears as a sub-problem in many hards)

Clarification Questions to Ask

Is the input alphabet bounded (lowercase letters, ASCII, arbitrary Unicode)? Determines the base choice.
Are there multiple queries on the same string, or a single scan? Multiple queries → prefix hash array; single scan → sliding window.
Is the answer a length, a count, or actual positions? Positions require storing indices alongside hashes.
Does "substring" mean contiguous? (Rolling hash only applies to contiguous windows.)
What is n? For n ≥ 10⁶, double hashing or random base selection is advisable to avoid anti-hash tests.
Is an exact answer required, or is probabilistic correctness acceptable? (Affects whether verification is needed.)

Common Interview Mistakes

Forgetting to add MOD before the subtraction step in the rolling update, producing negative hash values and wrong membership checks in non-Python languages.
Using a fixed (base, mod) pair from a textbook example, which is exploitable by anti-hash test cases on competitive programming judges — randomise the base or use two moduli.
Applying positional rolling hash to anagram / permutation problems, where character order should not matter — the correct structure is a frequency vector or order-independent hash.
Setting the binary search upper bound to len(S) instead of len(S) - 1, causing one unnecessary O(n) call and potential edge-case bugs.
Not verifying on hash match in problems where a single false positive changes the answer — the verification step is O(m) in the worst case but O(1) amortised across the full scan.
Computing BASE^(m-1) instead of BASE^m for the outgoing-character subtraction, shifting all subsequent hashes by a factor of BASE and causing every comparison to fail silently.

Typical Follow-Up Escalations

"Now find all positions of the duplicate substring, not just the length." → Store {hash: [start_indices]} in the hash set; collect all positions with the winning hash.
"Now handle up to 1000 pattern queries on the same text." → Build a prefix hash array once; answer each query in O(m) (hash the pattern, compare against the prefix array window).
"What if the text arrives as a stream, one character at a time?" → Use the sliding rolling hash (not the prefix array); maintain the current window hash and the outgoing-character power incrementally.
"Prove your solution is correct, not just probably correct." → Switch to Z-algorithm or KMP for deterministic O(n + m); rolling hash is a probabilistic technique.
"Now find the longest substring appearing in all k strings." → Binary search on L; for each L build a hash set per string; intersect the sets; O(nk log n) total.
"Can you reduce space to O(1)?" → Not possible with the prefix hash array; the sliding window approach uses O(1) space beyond the power value, but loses the ability to answer arbitrary substring queries.

Explore Unread

Great job! You've read all available articles

Rolling Hash

Core Concepts

Types / Variants

Key Algorithms

Rabin-Karp Substring Search

Prefix Hash Array Build and Query

Double Hashing

Binary Search + Rolling Hash (Longest Duplicate Substring)

Fixed-Window Anagram / Permutation Check

2D Rolling Hash

Advanced Techniques

Binary Search + Rolling Hash for Longest Repeated / Common Substring

Hashing with Multiple Lengths (Multi-Pattern Rabin-Karp)

Problem Taxonomy

By Problem Category

Problem Signal → Technique

Common Patterns

Edge Cases & Pitfalls

Implementation Notes

Cross-Topic Relations

Interview Reference

Must-Know Problems

Clarification Questions to Ask

Common Interview Mistakes

Typical Follow-Up Escalations

Read Next

Explore Unread

Rolling Hash

Core Concepts

Types / Variants

Key Algorithms

Rabin-Karp Substring Search

Prefix Hash Array Build and Query

Double Hashing

Binary Search + Rolling Hash (Longest Duplicate Substring)

Fixed-Window Anagram / Permutation Check

2D Rolling Hash

Advanced Techniques

Binary Search + Rolling Hash for Longest Repeated / Common Substring

Hashing with Multiple Lengths (Multi-Pattern Rabin-Karp)

Problem Taxonomy

By Problem Category

Problem Signal → Technique

Common Patterns

Edge Cases & Pitfalls

Implementation Notes

Cross-Topic Relations

Interview Reference

Must-Know Problems

Clarification Questions to Ask

Common Interview Mistakes

Typical Follow-Up Escalations

Read Next

Explore Unread