Hash Table

Core Concepts

Hash table — a data structure that maps keys to values using a hash function to compute an index (bucket) into an array of slots. Average O(1) insert, lookup, and delete.

Hash function — a deterministic function h(key) → index that maps a key to a bucket index in [0, m) where m is the table size. A good hash function distributes keys uniformly to minimize collisions.

Collision — when two distinct keys hash to the same bucket. All hash table implementations must handle collisions; the strategy is the primary design axis.

Load factor (α) — n / m where n = number of stored entries, m = number of buckets. Controls the collision probability and memory/speed tradeoff. Typical resize threshold: α > 0.75 for chaining, α > 0.5 for open addressing.

Rehashing — when the load factor threshold is exceeded, allocate a new array (typically 2× size) and reinsert all entries by recomputing hashes. Amortizes to O(1) per insertion over a sequence of n insertions.

Amortized O(1) — individual operations may trigger O(n) rehashing, but the average cost per operation across n operations is O(1).

Separate chaining — each bucket holds a linked list (or tree) of all entries that hash to it. Allows load factor > 1.

Open addressing — all entries stored inside the array itself; on collision, probe adjacent slots according to a fixed sequence until an empty slot is found. Load factor must stay below 1.

Tombstone — a sentinel value left in a slot when a key is deleted under open addressing, so probing chains are not broken for subsequent lookups.

Perfect hashing — a hash function with zero collisions for a known static set of keys. Used in compilers and read-only tables; O(1) worst-case lookup.

Universal hashing — a family of hash functions where for any two distinct keys, the probability a randomly chosen function from the family collides them is ≤ 1/m. Prevents adversarial input from causing worst-case behavior.

Structural Invariants

Invariant	What it guarantees	What breaks if violated
Each key appears in exactly one bucket	Lookup finds the unique entry	Duplicate keys yield inconsistent get/put results
Load factor stays below threshold	Expected O(1) operations	Chains or probe sequences degrade to O(n)
After rehash, every entry recomputed in new array	No entry is lost or unreachable	Silent data loss on lookup
Open addressing: deleted slots marked as tombstones, not empty	Probe chains remain intact	Lookups terminate early and falsely report key-not-found
Hash function is deterministic for equal keys	get(k) always reaches the same bucket put(k) wrote to	Infinite loop or miss on lookup

Internal Representation

Separate chaining — array of size m, each slot a pointer to a linked list of (key, value) pairs. Java HashMap uses this; switches the chain to a red-black tree when a bucket's chain length exceeds 8 (treeification threshold), reverting to a list when it shrinks below 6.

Open addressing with linear probing — array of size m, probe sequence (h(k) + i) mod m. Exhibits primary clustering: long runs of occupied slots form and attract more collisions. Cache-friendly because accesses are sequential.

Open addressing with quadratic probing — probe sequence (h(k) + i²) mod m. Reduces primary clustering but introduces secondary clustering (keys with the same initial hash follow the same probe path). Requires m to be prime or a power of 2 with specific offsets to guarantee all slots are visited.

Double hashing — probe sequence (h₁(k) + i·h₂(k)) mod m where h₂(k) ≠ 0 and is coprime to m. Eliminates secondary clustering; best distribution among open addressing methods. Slightly more expensive per probe.

Python dict — compact hash table using open addressing with a pseudo-random probe: i = (5*i + 1 + perturb) mod m, perturb >>= 5. Preserves insertion order since Python 3.7. Initial size 8; resize when 2/3 full.

Robin Hood hashing — open addressing variant where an incoming key can "steal" a slot from a key with a smaller displacement (distance from its ideal slot). Reduces variance in probe length; used in many modern implementations (Rust HashMap, C++ Abseil).

Types / Variants

Variant	What it is	When to use it	Key property
HashMap	Unordered key→value map	Default choice; O(1) avg all ops	No ordering guarantee
LinkedHashMap	HashMap + doubly linked list of insertion order	Need iteration in insertion order; LRU cache	O(1) ops, O(n) ordered iteration
TreeMap / SortedMap	BST-backed map (Red-Black)	Need sorted keys, floor/ceil, range queries	O(log n) all ops; maintains sort order
Frequency / Counter map	HashMap<key, int> tracking counts	Character frequency, word count, sliding window	Conceptual pattern, not a separate class
Multimap	One key → multiple values	Grouping problems (anagram buckets)	Values stored as list per key
Bidirectional map (Bimap)	Two HashMaps mirroring each other	Bijection problems; two-way lookup	O(1) lookup by key or value; maintain sync manually
Bloom filter	Probabilistic set membership	Check if element might be in a large set	False positives possible; no false negatives; O(1) but lossy

See trees.md for TreeMap internals (Red-Black Tree).

Topic-Specific Operations

put(key, value) — O(1) amortized

Compute h(key) → bucket. In chaining: walk the chain to update if key exists, else prepend. In open addressing: probe until empty or matching slot found, write there. If load factor exceeded after insert, trigger rehash.

get(key) — O(1) average

Compute h(key) → bucket. In chaining: walk the chain comparing keys. In open addressing: probe the sequence until key found or empty slot (not tombstone) encountered. Return null/None/default if not found.

remove(key) — O(1) average

Chaining: find in chain, unlink node. Open addressing: find slot, mark it as a tombstone (not empty) to preserve probe chain integrity. Periodically rehash to reclaim tombstone slots if deletions are frequent.

containsKey(key) — O(1) average

Same as get(key) but return boolean rather than value.

Iteration — O(n + m)

Must scan all m buckets, including empty ones. Iterating n entries costs O(n + m); if m >> n (over-allocated table), iteration is expensive despite few entries. Python dict uses compact storage so iteration is O(n).

Key Algorithms

Division Method Hashing

h(k) = k mod m. Fast; m should be a prime not close to a power of 2 to avoid patterns from binary keys. Avoid powers of 2 because they only use the low-order bits of k.

Polynomial Rolling Hash (strings)

h(s) = (s[0]·p^(n-1) + s[1]·p^(n-2) + … + s[n-1]) mod M where p is a small prime (31 for lowercase letters, 131 for all ASCII), M a large prime (1e9+7). Key insight: substring hash computable in O(1) from prefix hashes. Time O(n) preprocessing, O(1) per query.

prefix[i+1] = (prefix[i] * p + ord(s[i])) % M
h(s[l:r]) = (prefix[r] - prefix[l] * p**(r-l)) % M

Rabin-Karp Sliding Window Hash

Maintain a rolling hash of a fixed-length window: remove the outgoing character's contribution, shift remaining by p, add the incoming character. O(1) per slide after O(k) initial hash. Used for substring search and duplicate substring detection.

Double Hashing (open addressing probe)

h(k, i) = (h₁(k) + i · h₂(k)) mod m h₂(k) = q − (k mod q) for prime q < m ensures h₂(k) ≠ 0 and is coprime to m, guaranteeing all m slots probed. Better distribution than linear or quadratic probing; slower due to two hash computations.

Universal Hashing (adversarial input defense)

Choose h_{a,b}(k) = ((a·k + b) mod p) mod m where p is prime ≥ universe size, a ∈ [1, p-1], b ∈ [0, p-1] chosen randomly. Guarantees expected O(1) regardless of input. Python randomizes hash seeds per process start (PYTHONHASHSEED) for this reason.

Advanced Techniques

Prefix Sum + Hash Map (subarray problems)

What it solves: "Find subarray with sum/XOR/property equal to target" in O(n). Mechanism: Store prefixVal → earliest_index in a hash map. At each index i, check if prefix[i] - target exists in the map; if so, the subarray (map[prefix[i]-target], i] satisfies the condition. When to use over sliding window: When the array contains negative numbers (sliding window requires non-negative for monotone window expansion). If all values are positive, sliding window is simpler. Complexity: O(n) time, O(n) space.

Rolling Hash for Duplicate Substring Detection

What it solves: "Is there a repeated substring of length k?" in O(n) average (vs O(n·k) naive). Mechanism: Hash all substrings of length k using a sliding window hash; store hashes in a set. On collision, verify with string comparison (O(k)) to rule out false positives. When to use over suffix arrays: Competitive/interview settings without suffix array library; simpler to implement. Use suffix arrays when you need all duplicate substrings or exact counts. Complexity: O(n) average time; O(n) space.

Two Hash Maps for Bijection / Bidirectional Lookup

What it solves: Isomorphism, word pattern, encoding/decoding — problems requiring "every A maps to exactly one B and vice versa." Mechanism: Maintain A→B and B→A simultaneously; on each pair (a, b), check both maps before writing. When to use over single map: Whenever uniqueness must hold in both directions. A single map catches only one-to-one from A side; the reverse map catches many-to-one from B side. Complexity: O(n) time and space.

Sliding Window + Frequency Map (at-most-k distinct)

What it solves: "Longest subarray/substring with at most k distinct characters/values." Mechanism: Maintain a char→count map inside a two-pointer window. When distinct count exceeds k, shrink left pointer and decrement counts (remove entry when count hits 0). When to use over sorting: When you need the actual window or when k is small relative to n. Sorting destroys order and costs O(n log n). Complexity: O(n) time, O(k) space (at most k+1 entries in map at once).

Problem Taxonomy

By Problem Category

Category	What it asks	Key technique
Frequency counting	Count occurrences, find majority, most frequent k	Counter map; then heap or sort
Complement lookup	Two elements summing/XORing to target	Hash set for "seen"; one-pass
Subarray with target sum	Contiguous subarray summing to k	Prefix sum + hash map
Grouping / categorization	Group strings by anagram, by property	Hash map keyed by canonical form (sorted string, prime product)
Duplicate detection	Has duplicate, find duplicate, first duplicate	Hash set; or value-as-index trick
Isomorphism / pattern	Word pattern, string isomorphism	Two hash maps (bidirectional)
Substring matching	Find all anagrams, repeated substring	Sliding window frequency map; rolling hash
LRU / LFU Cache	Evict least recently / frequently used	LinkedHashMap or dict + doubly linked list
Graph / union-find auxiliary	Map node labels to integers	HashMap for label normalization
Memoization (top-down DP)	Cache recursive subproblem results	HashMap when states are sparse or non-integer

Problem Signal → Technique

Signal in problem	Likely technique
"two numbers sum to target"	Hash set, one-pass complement check
"subarray sum equals k"	Prefix sum + hash map
"longest subarray / substring with at most k distinct"	Sliding window + frequency map
"group anagrams" / "find all anagrams"	Sort-key or character-count-key hash map
"isomorphic" / "word pattern"	Two bidirectional hash maps
"find duplicate in O(n) time O(1) space"	Floyd's cycle; if O(n) space allowed → hash set
"first unique / first repeated"	LinkedHashMap or ordered dict + count
"design cache with eviction"	LinkedHashMap (Java) or OrderedDict (Python)
"count distinct values in window"	Sliding window + hash map size
"repeated DNA sequence" / "repeated substring"	Rolling hash or sliding window hash set
"check if two strings are permutations"	Frequency map equality or 26-int array

Common Patterns

Pattern	When it applies	Mechanism
Counter map	Count occurrences of any hashable	`d[k] = d.get(k, 0) + 1`; Python `Counter`
Complement lookup	Pair-sum or pair-XOR to target	Insert seen values; check `target - x` before inserting x
Canonical key grouping	Group items that are "equivalent"	Map each item to a canonical form (sorted tuple, frozenset, prime encoding) as the key
Prefix aggregate + hash map	Subarray/substring with aggregate equal to target	Store `prefix[i] → index` in map; query `prefix[i] - target`
Sliding window frequency map	Fixed or variable window with at-most-k constraint	Two pointers; map tracks counts; shrink when invariant breaks
Set for O(1) membership	"Have we seen x before?" without value storage	`seen = set(); x in seen`
Index map	Reconstruct order or find span	`val → index` map; compute `i - map[val]`
Two maps for bijection	Enforce one-to-one mapping in both directions	`fwd[a] = b` and `rev[b] = a`; check both before writing

Edge Cases & Pitfalls

Mutable keys — using a list or dict as a key raises TypeError in Python because mutable objects are not hashable. Convert to tuple or frozenset before inserting. In interviews, ask "are keys guaranteed to be hashable/immutable?"
Default value trap — dict[key] raises KeyError if key absent; use dict.get(key, default) or collections.defaultdict. Forgetting this causes runtime crashes on first-time insertions in frequency maps.
Removing entry vs. setting count to zero — in a sliding window frequency map, failing to del map[k] when map[k] reaches 0 causes the distinct-count (via len(map)) to be wrong. Always delete the entry rather than leaving a zero.
Hash collision in custom objects — if you define __eq__ without defining __hash__, Python sets __hash__ = None, making the object unhashable. Both must be defined consistently: equal objects must have equal hashes.
Integer overflow in polynomial hash — without modular arithmetic, intermediate values overflow 64-bit integers. Always reduce modulo a large prime at each step, not only at the end.
Rabin-Karp false positives — hash equality does not guarantee string equality. Always verify with a character-by-character comparison on hash match.
Concurrent modification — iterating over a dict while inserting or deleting keys causes RuntimeError in Python. Collect keys to modify first, then modify.
Load factor and worst-case — adversarial inputs can cause many collisions in a fixed-hash table, degrading every operation to O(n). In competitive programming, use randomized hash seeds or polynomial hashing with a random base to avoid this.
is vs == for keys — Python interns small integers and short strings; x is y may coincidentally be true. Always use == for key comparison logic.
Prefix sum index off-by-one — initialize prefix_map = {0: -1} (sum 0 seen before index 0) to handle subarrays starting at index 0. Forgetting this misses subarrays that start from the beginning.

Implementation Notes

Python dict and set are backed by hash tables; in check is O(1) average, not O(n) — never use a list for membership tests when a set suffices.
Python Counter is a subclass of dict; Counter(a) == Counter(b) compares frequency maps directly and works for anagram checks.
Python collections.defaultdict(int) eliminates get(k, 0) boilerplate; defaultdict(list) for grouping.
Python collections.OrderedDict maintains insertion order with O(1) move-to-end (move_to_end(k)), enabling O(1) LRU cache operations.
Java HashMap is not thread-safe; use ConcurrentHashMap for concurrent access.
Java HashMap allows one null key; Hashtable does not — a common interview distinction.
Python dict preserves insertion order since 3.7 (implementation detail since 3.6); do not rely on this in older codebases.
Recursion depth limit (Python default 1000) matters only if you implement chaining traversal recursively — use iteration instead.
frozenset is hashable and can be used as a dict key; useful for grouping by set of characters without sorting.
When implementing a custom hash for a composite key, hash(tuple(fields)) is idiomatic and correct in Python.

Cross-Topic Relations

Related topic	Specific connection
Arrays	Hash table's underlying storage; bucket array is the physical structure
Linked Lists	Chaining uses singly linked lists per bucket; LRU cache uses a doubly linked list alongside a hash map
Strings	Polynomial and rolling hash are the primary tools for O(n) substring algorithms
Prefix Sum	Combines with hash map to solve subarray-sum problems in O(n); the two techniques are almost always paired
Sliding Window	Frequency map tracks window contents; hash map enables O(1) window state updates
Two Pointers	Complement pairs (two-sum) use a hash set; two pointers use sorted order — choose based on whether sorted order is needed
Dynamic Programming	Hash map as memoization cache for top-down DP with non-integer or sparse states
Graph	Hash map normalizes node labels; adjacency list is often `dict[node] → list[neighbor]`
Heap	Combined with frequency map for top-k frequent elements: count with map, then heapify
Trie	Alternative to hash map for prefix-based string lookups; Trie wins on prefix queries, hash map wins on exact lookups

See sliding-window.md for patterns that pair with frequency maps. See prefix-sum.md for the full prefix-sum + hash map subarray technique.

Interview Reference

Must-Know Problems

Two-Sum / Complement Lookup

Two Sum (Easy) — canonical; one-pass hash map
Three Sum (Medium) — sort + two pointers; hash map variant exists
Subarray Sum Equals K (Medium) — prefix sum + hash map
Continuous Subarray Sum (Medium) — prefix sum mod k + hash map

Frequency and Counting

Top K Frequent Elements (Medium) — Counter + heap or bucket sort
First Unique Character in String (Easy)
Find All Anagrams in a String (Medium) — sliding window + frequency array
Group Anagrams (Medium) — sorted key or character-count tuple key

Sliding Window with Hash Map

Longest Substring Without Repeating Characters (Medium)
Longest Substring with At Most K Distinct Characters (Medium)
Minimum Window Substring (Hard) — need/have counter pattern
Subarrays with K Different Integers (Hard) — at-most-k minus at-most-(k-1)

Design Problems

LRU Cache (Medium) — OrderedDict or dict + doubly linked list
LFU Cache (Hard) — two hash maps + doubly linked list per frequency
Design HashMap (Easy) — implement from scratch with chaining

Isomorphism and Bijection

Isomorphic Strings (Easy) — two maps
Word Pattern (Easy) — two maps
Word Pattern II (Hard) — backtracking + two maps

Hashing Strings / Substrings

Repeated DNA Sequences (Medium) — rolling hash or sliding window set
Longest Duplicate Substring (Hard) — binary search + rolling hash
Implement Rabin-Karp (not on LC, but asked in system design rounds)

Clarification Questions to Ask

Are keys guaranteed to be hashable? (Lists, dicts as keys require conversion)
Can values be negative or zero? (Affects complement-lookup and prefix-sum logic)
Are there duplicate keys in the input? (Frequency map vs. existence set)
Is the problem asking for any valid answer or all answers? (Early return vs. full traversal)
Is "subarray" contiguous? (Prefix-sum technique applies only to contiguous)
What is the character set? (26 lowercase, ASCII, Unicode — affects array vs. map choice)
Can the answer be empty / zero-length? (Edge case for sliding window lower bound)

Common Interview Mistakes

Using dict[key] without a default, crashing on first encounter of a key — use get or defaultdict.
Forgetting prefix_map = {0: -1} initialization in prefix-sum problems, missing subarrays that start at index 0.
Checking if map[char] == 0: del map[char] after decrement but after already using len(map) for the distinct count — must delete before the length check matters.
Using a list instead of a set for "have we seen this?" checks, making the solution O(n²) instead of O(n).
Building the map first, then doing a second pass — most problems are solvable in one pass where you check before inserting.
Forgetting that Counter({'a': 0}) is not equal to Counter() — stale zero-count entries cause incorrect equality checks.
Implementing isomorphism with only one map direction and missing the many-to-one violation from the target side.

Typical Follow-Up Escalations

Follow-up	Expected answer
"Now solve it with O(1) space"	Usually forces a different algorithm (e.g., Floyd's cycle detection instead of hash set for duplicates)
"What if the input stream is too large to fit in memory?"	Approximate counting (Count-Min Sketch), bloom filters, or reservoir sampling
"Can you make the cache thread-safe?"	Add locking around get/put; or use `ConcurrentHashMap`; discuss lock granularity
"What if keys are adversarially chosen to cause collisions?"	Universal hashing with random seed; Java/Python both do this by default
"How would you extend Two Sum to a stream?"	Maintain hash set of seen values; check `target - x` before inserting each element
"Can you do Top-K without sorting?"	Quickselect O(n) average; or bucket sort by frequency O(n)

Explore Unread

Great job! You've read all available articles

Hash Table

Core Concepts

Hash table — a data structure that maps keys to values using a hash function to compute an index (bucket) into an array of slots. Average O(1) insert, lookup, and delete.

Collision — when two distinct keys hash to the same bucket. All hash table implementations must handle collisions; the strategy is the primary design axis.

Amortized O(1) — individual operations may trigger O(n) rehashing, but the average cost per operation across n operations is O(1).

Separate chaining — each bucket holds a linked list (or tree) of all entries that hash to it. Allows load factor > 1.

Open addressing — all entries stored inside the array itself; on collision, probe adjacent slots according to a fixed sequence until an empty slot is found. Load factor must stay below 1.

Tombstone — a sentinel value left in a slot when a key is deleted under open addressing, so probing chains are not broken for subsequent lookups.

Perfect hashing — a hash function with zero collisions for a known static set of keys. Used in compilers and read-only tables; O(1) worst-case lookup.

Structural Invariants

Invariant	What it guarantees	What breaks if violated
Each key appears in exactly one bucket	Lookup finds the unique entry	Duplicate keys yield inconsistent get/put results
Load factor stays below threshold	Expected O(1) operations	Chains or probe sequences degrade to O(n)
After rehash, every entry recomputed in new array	No entry is lost or unreachable	Silent data loss on lookup
Open addressing: deleted slots marked as tombstones, not empty	Probe chains remain intact	Lookups terminate early and falsely report key-not-found
Hash function is deterministic for equal keys	get(k) always reaches the same bucket put(k) wrote to	Infinite loop or miss on lookup

Internal Representation

Types / Variants

Variant	What it is	When to use it	Key property
HashMap	Unordered key→value map	Default choice; O(1) avg all ops	No ordering guarantee
LinkedHashMap	HashMap + doubly linked list of insertion order	Need iteration in insertion order; LRU cache	O(1) ops, O(n) ordered iteration
TreeMap / SortedMap	BST-backed map (Red-Black)	Need sorted keys, floor/ceil, range queries	O(log n) all ops; maintains sort order
Frequency / Counter map	HashMap<key, int> tracking counts	Character frequency, word count, sliding window	Conceptual pattern, not a separate class
Multimap	One key → multiple values	Grouping problems (anagram buckets)	Values stored as list per key
Bidirectional map (Bimap)	Two HashMaps mirroring each other	Bijection problems; two-way lookup	O(1) lookup by key or value; maintain sync manually
Bloom filter	Probabilistic set membership	Check if element might be in a large set	False positives possible; no false negatives; O(1) but lossy

See trees.md for TreeMap internals (Red-Black Tree).

Topic-Specific Operations

put(key, value) — O(1) amortized

get(key) — O(1) average

remove(key) — O(1) average

containsKey(key) — O(1) average

Same as get(key) but return boolean rather than value.

Iteration — O(n + m)

Key Algorithms

Division Method Hashing

h(k) = k mod m. Fast; m should be a prime not close to a power of 2 to avoid patterns from binary keys. Avoid powers of 2 because they only use the low-order bits of k.

Polynomial Rolling Hash (strings)

prefix[i+1] = (prefix[i] * p + ord(s[i])) % M
h(s[l:r]) = (prefix[r] - prefix[l] * p**(r-l)) % M

Rabin-Karp Sliding Window Hash

Double Hashing (open addressing probe)

Universal Hashing (adversarial input defense)

Advanced Techniques

Prefix Sum + Hash Map (subarray problems)

Rolling Hash for Duplicate Substring Detection

Two Hash Maps for Bijection / Bidirectional Lookup

Sliding Window + Frequency Map (at-most-k distinct)

Problem Taxonomy

By Problem Category

Category	What it asks	Key technique
Frequency counting	Count occurrences, find majority, most frequent k	Counter map; then heap or sort
Complement lookup	Two elements summing/XORing to target	Hash set for "seen"; one-pass
Subarray with target sum	Contiguous subarray summing to k	Prefix sum + hash map
Grouping / categorization	Group strings by anagram, by property	Hash map keyed by canonical form (sorted string, prime product)
Duplicate detection	Has duplicate, find duplicate, first duplicate	Hash set; or value-as-index trick
Isomorphism / pattern	Word pattern, string isomorphism	Two hash maps (bidirectional)
Substring matching	Find all anagrams, repeated substring	Sliding window frequency map; rolling hash
LRU / LFU Cache	Evict least recently / frequently used	LinkedHashMap or dict + doubly linked list
Graph / union-find auxiliary	Map node labels to integers	HashMap for label normalization
Memoization (top-down DP)	Cache recursive subproblem results	HashMap when states are sparse or non-integer

Problem Signal → Technique

Signal in problem	Likely technique
"two numbers sum to target"	Hash set, one-pass complement check
"subarray sum equals k"	Prefix sum + hash map
"longest subarray / substring with at most k distinct"	Sliding window + frequency map
"group anagrams" / "find all anagrams"	Sort-key or character-count-key hash map
"isomorphic" / "word pattern"	Two bidirectional hash maps
"find duplicate in O(n) time O(1) space"	Floyd's cycle; if O(n) space allowed → hash set
"first unique / first repeated"	LinkedHashMap or ordered dict + count
"design cache with eviction"	LinkedHashMap (Java) or OrderedDict (Python)
"count distinct values in window"	Sliding window + hash map size
"repeated DNA sequence" / "repeated substring"	Rolling hash or sliding window hash set
"check if two strings are permutations"	Frequency map equality or 26-int array

Common Patterns

Pattern	When it applies	Mechanism
Counter map	Count occurrences of any hashable	`d[k] = d.get(k, 0) + 1`; Python `Counter`
Complement lookup	Pair-sum or pair-XOR to target	Insert seen values; check `target - x` before inserting x
Canonical key grouping	Group items that are "equivalent"	Map each item to a canonical form (sorted tuple, frozenset, prime encoding) as the key
Prefix aggregate + hash map	Subarray/substring with aggregate equal to target	Store `prefix[i] → index` in map; query `prefix[i] - target`
Sliding window frequency map	Fixed or variable window with at-most-k constraint	Two pointers; map tracks counts; shrink when invariant breaks
Set for O(1) membership	"Have we seen x before?" without value storage	`seen = set(); x in seen`
Index map	Reconstruct order or find span	`val → index` map; compute `i - map[val]`
Two maps for bijection	Enforce one-to-one mapping in both directions	`fwd[a] = b` and `rev[b] = a`; check both before writing

Edge Cases & Pitfalls

Mutable keys — using a list or dict as a key raises TypeError in Python because mutable objects are not hashable. Convert to tuple or frozenset before inserting. In interviews, ask "are keys guaranteed to be hashable/immutable?"
Default value trap — dict[key] raises KeyError if key absent; use dict.get(key, default) or collections.defaultdict. Forgetting this causes runtime crashes on first-time insertions in frequency maps.
Removing entry vs. setting count to zero — in a sliding window frequency map, failing to del map[k] when map[k] reaches 0 causes the distinct-count (via len(map)) to be wrong. Always delete the entry rather than leaving a zero.
Hash collision in custom objects — if you define __eq__ without defining __hash__, Python sets __hash__ = None, making the object unhashable. Both must be defined consistently: equal objects must have equal hashes.
Integer overflow in polynomial hash — without modular arithmetic, intermediate values overflow 64-bit integers. Always reduce modulo a large prime at each step, not only at the end.
Rabin-Karp false positives — hash equality does not guarantee string equality. Always verify with a character-by-character comparison on hash match.
Concurrent modification — iterating over a dict while inserting or deleting keys causes RuntimeError in Python. Collect keys to modify first, then modify.
Load factor and worst-case — adversarial inputs can cause many collisions in a fixed-hash table, degrading every operation to O(n). In competitive programming, use randomized hash seeds or polynomial hashing with a random base to avoid this.
is vs == for keys — Python interns small integers and short strings; x is y may coincidentally be true. Always use == for key comparison logic.
Prefix sum index off-by-one — initialize prefix_map = {0: -1} (sum 0 seen before index 0) to handle subarrays starting at index 0. Forgetting this misses subarrays that start from the beginning.

Implementation Notes

Python dict and set are backed by hash tables; in check is O(1) average, not O(n) — never use a list for membership tests when a set suffices.
Python Counter is a subclass of dict; Counter(a) == Counter(b) compares frequency maps directly and works for anagram checks.
Python collections.defaultdict(int) eliminates get(k, 0) boilerplate; defaultdict(list) for grouping.
Python collections.OrderedDict maintains insertion order with O(1) move-to-end (move_to_end(k)), enabling O(1) LRU cache operations.
Java HashMap is not thread-safe; use ConcurrentHashMap for concurrent access.
Java HashMap allows one null key; Hashtable does not — a common interview distinction.
Python dict preserves insertion order since 3.7 (implementation detail since 3.6); do not rely on this in older codebases.
Recursion depth limit (Python default 1000) matters only if you implement chaining traversal recursively — use iteration instead.
frozenset is hashable and can be used as a dict key; useful for grouping by set of characters without sorting.
When implementing a custom hash for a composite key, hash(tuple(fields)) is idiomatic and correct in Python.

Cross-Topic Relations

Related topic	Specific connection
Arrays	Hash table's underlying storage; bucket array is the physical structure
Linked Lists	Chaining uses singly linked lists per bucket; LRU cache uses a doubly linked list alongside a hash map
Strings	Polynomial and rolling hash are the primary tools for O(n) substring algorithms
Prefix Sum	Combines with hash map to solve subarray-sum problems in O(n); the two techniques are almost always paired
Sliding Window	Frequency map tracks window contents; hash map enables O(1) window state updates
Two Pointers	Complement pairs (two-sum) use a hash set; two pointers use sorted order — choose based on whether sorted order is needed
Dynamic Programming	Hash map as memoization cache for top-down DP with non-integer or sparse states
Graph	Hash map normalizes node labels; adjacency list is often `dict[node] → list[neighbor]`
Heap	Combined with frequency map for top-k frequent elements: count with map, then heapify
Trie	Alternative to hash map for prefix-based string lookups; Trie wins on prefix queries, hash map wins on exact lookups

See sliding-window.md for patterns that pair with frequency maps. See prefix-sum.md for the full prefix-sum + hash map subarray technique.

Interview Reference

Must-Know Problems

Two-Sum / Complement Lookup

Two Sum (Easy) — canonical; one-pass hash map
Three Sum (Medium) — sort + two pointers; hash map variant exists
Subarray Sum Equals K (Medium) — prefix sum + hash map
Continuous Subarray Sum (Medium) — prefix sum mod k + hash map

Frequency and Counting

Top K Frequent Elements (Medium) — Counter + heap or bucket sort
First Unique Character in String (Easy)
Find All Anagrams in a String (Medium) — sliding window + frequency array
Group Anagrams (Medium) — sorted key or character-count tuple key

Sliding Window with Hash Map

Longest Substring Without Repeating Characters (Medium)
Longest Substring with At Most K Distinct Characters (Medium)
Minimum Window Substring (Hard) — need/have counter pattern
Subarrays with K Different Integers (Hard) — at-most-k minus at-most-(k-1)

Design Problems

LRU Cache (Medium) — OrderedDict or dict + doubly linked list
LFU Cache (Hard) — two hash maps + doubly linked list per frequency
Design HashMap (Easy) — implement from scratch with chaining

Isomorphism and Bijection

Isomorphic Strings (Easy) — two maps
Word Pattern (Easy) — two maps
Word Pattern II (Hard) — backtracking + two maps

Hashing Strings / Substrings

Repeated DNA Sequences (Medium) — rolling hash or sliding window set
Longest Duplicate Substring (Hard) — binary search + rolling hash
Implement Rabin-Karp (not on LC, but asked in system design rounds)

Clarification Questions to Ask

Are keys guaranteed to be hashable? (Lists, dicts as keys require conversion)
Can values be negative or zero? (Affects complement-lookup and prefix-sum logic)
Are there duplicate keys in the input? (Frequency map vs. existence set)
Is the problem asking for any valid answer or all answers? (Early return vs. full traversal)
Is "subarray" contiguous? (Prefix-sum technique applies only to contiguous)
What is the character set? (26 lowercase, ASCII, Unicode — affects array vs. map choice)
Can the answer be empty / zero-length? (Edge case for sliding window lower bound)

Common Interview Mistakes

Using dict[key] without a default, crashing on first encounter of a key — use get or defaultdict.
Forgetting prefix_map = {0: -1} initialization in prefix-sum problems, missing subarrays that start at index 0.
Checking if map[char] == 0: del map[char] after decrement but after already using len(map) for the distinct count — must delete before the length check matters.
Using a list instead of a set for "have we seen this?" checks, making the solution O(n²) instead of O(n).
Building the map first, then doing a second pass — most problems are solvable in one pass where you check before inserting.
Forgetting that Counter({'a': 0}) is not equal to Counter() — stale zero-count entries cause incorrect equality checks.
Implementing isomorphism with only one map direction and missing the many-to-one violation from the target side.

Typical Follow-Up Escalations

Follow-up	Expected answer
"Now solve it with O(1) space"	Usually forces a different algorithm (e.g., Floyd's cycle detection instead of hash set for duplicates)
"What if the input stream is too large to fit in memory?"	Approximate counting (Count-Min Sketch), bloom filters, or reservoir sampling
"Can you make the cache thread-safe?"	Add locking around get/put; or use `ConcurrentHashMap`; discuss lock granularity
"What if keys are adversarially chosen to cause collisions?"	Universal hashing with random seed; Java/Python both do this by default
"How would you extend Two Sum to a stream?"	Maintain hash set of seen values; check `target - x` before inserting each element
"Can you do Top-K without sorting?"	Quickselect O(n) average; or bucket sort by frequency O(n)

Explore Unread

Great job! You've read all available articles

Hash Table

Core Concepts

Structural Invariants

Internal Representation

Types / Variants

Topic-Specific Operations

put(key, value) — O(1) amortized

get(key) — O(1) average

remove(key) — O(1) average

containsKey(key) — O(1) average

Iteration — O(n + m)

Key Algorithms

Division Method Hashing

Polynomial Rolling Hash (strings)

Rabin-Karp Sliding Window Hash

Double Hashing (open addressing probe)

Universal Hashing (adversarial input defense)

Advanced Techniques

Prefix Sum + Hash Map (subarray problems)

Rolling Hash for Duplicate Substring Detection

Two Hash Maps for Bijection / Bidirectional Lookup

Sliding Window + Frequency Map (at-most-k distinct)

Problem Taxonomy

By Problem Category

Problem Signal → Technique

Common Patterns

Edge Cases & Pitfalls

Implementation Notes

Cross-Topic Relations

Interview Reference

Must-Know Problems

Clarification Questions to Ask

Common Interview Mistakes

Typical Follow-Up Escalations

Read Next

Explore Unread

Hash Table

Core Concepts

Structural Invariants

Internal Representation

Types / Variants

Topic-Specific Operations

put(key, value) — O(1) amortized

get(key) — O(1) average

remove(key) — O(1) average

containsKey(key) — O(1) average

Iteration — O(n + m)

Key Algorithms

Division Method Hashing

Polynomial Rolling Hash (strings)

Rabin-Karp Sliding Window Hash

Double Hashing (open addressing probe)

Universal Hashing (adversarial input defense)

Advanced Techniques

Prefix Sum + Hash Map (subarray problems)

Rolling Hash for Duplicate Substring Detection

Two Hash Maps for Bijection / Bidirectional Lookup

Sliding Window + Frequency Map (at-most-k distinct)

Problem Taxonomy

By Problem Category

Problem Signal → Technique

Common Patterns

Edge Cases & Pitfalls

Implementation Notes

Cross-Topic Relations

Interview Reference

Must-Know Problems

Clarification Questions to Ask

Common Interview Mistakes

Typical Follow-Up Escalations

Read Next

Explore Unread