Database (SQL)
Type: Technique-Based — SQL is a declarative querying technique applied to relational data. LeetCode has 300+ problems under this tag; all variants, advanced techniques, and full edge-case coverage are required.
Core Concepts
Relation (Table) — An unordered set of rows sharing the same named columns; "unordered" means no row position is guaranteed without ORDER BY.
Tuple (Row/Record) — A single entry in a table; uniquely identified by the primary key.
Attribute (Column/Field) — A named, typed slot in every row; every cell can hold NULL.
NULL — Absence of a value, not zero or empty string; any arithmetic or comparison with NULL yields NULL (three-valued logic: TRUE / FALSE / UNKNOWN).
Primary Key — One or more columns whose combined values uniquely identify each row; implicitly NOT NULL.
Foreign Key — A column whose values must match a primary key in another table (referential integrity); may be NULL if the relationship is optional.
Aggregate Function — Collapses many rows into one value: COUNT, SUM, AVG, MIN, MAX; ignores NULLs except COUNT(*).
Window Function — Computes a value for each row using a sliding frame of related rows, without collapsing them the way GROUP BY does.
Predicate — A boolean-valued expression used in WHERE or HAVING; evaluates to TRUE, FALSE, or UNKNOWN.
Cardinality — The number of distinct values in a column; high cardinality = many unique values.
Execution Order — The logical sequence in which clauses are evaluated (see Structural Invariants); determines what aliases and results are visible where.
Structural Invariants
SQL has a fixed logical evaluation order. Violating it (e.g., referencing a SELECT alias in WHERE) causes a syntax or runtime error.
| Step | Clause | What is visible at this step | Common violation |
|---|---|---|---|
| 1 | FROM / JOIN | Raw tables and join results | Trying to filter before the join materialises |
| 2 | WHERE | Columns from FROM result; NOT SELECT aliases | Using SELECT alias in WHERE → error |
| 3 | GROUP BY | Columns from WHERE result | Grouping on a derived alias not yet computed |
| 4 | HAVING | Aggregate results and GROUP BY keys | Filtering groups using WHERE instead of HAVING |
| 5 | SELECT | All computed expressions, aliases assigned here | Using window function output in WHERE |
| 6 | DISTINCT | SELECT output | |
| 7 | ORDER BY | SELECT output; aliases ARE visible here | |
| 8 | LIMIT / OFFSET | Final ordered result |
Key invariant: SELECT aliases do not exist until step 5, so they cannot be used in WHERE, GROUP BY, or HAVING (except MySQL's non-standard GROUP BY alias extension).
Types / Variants
JOIN Types
| Type | Rows returned | When to use |
|---|---|---|
INNER JOIN | Only rows with matching keys in both tables | Default; drops unmatched rows |
LEFT JOIN | All rows from left + matched right; NULLs where no match | Preserve left table; detect missing right rows |
RIGHT JOIN | All rows from right + matched left; NULLs where no match | Rarely used; rewrite as LEFT JOIN instead |
FULL OUTER JOIN | All rows from both, NULLs on either side | Find all unmatched on both sides |
CROSS JOIN | Cartesian product (m × n rows) | Generate all combinations; date/value grids |
| Self-join | A table joined to itself using aliases | Compare rows within the same table |
Set Operations
| Operation | Semantics | Duplicate handling |
|---|---|---|
UNION | Rows in A or B | Removes duplicates (like DISTINCT) |
UNION ALL | Rows in A or B | Keeps all duplicates; faster |
INTERSECT | Rows in both A and B | Removes duplicates |
EXCEPT / MINUS | Rows in A but not B | Removes duplicates |
Column count and types must match across all branches.
Subquery Forms
| Form | Returns | Correlated? | When to use |
|---|---|---|---|
| Scalar subquery | Single value | Either | Use in SELECT or WHERE comparisons |
| Row subquery | Single row | Either | Compare (a, b) = (SELECT ...) |
| Table subquery (derived table) | Result set | No | Use in FROM as (SELECT ...) AS alias |
| Correlated subquery | Depends on outer row | Yes | Row-by-row checks; often rewritable as JOIN |
EXISTS / NOT EXISTS | Boolean | Yes | Semi-join / anti-join; stops at first match |
Key Algorithms
Filtering with WHERE vs HAVING
WHERE filters individual rows before grouping; HAVING filters groups after aggregation. Use WHERE when the condition is on a non-aggregated column; use HAVING when it depends on an aggregate value.
Anti-Join (LEFT JOIN + IS NULL)
Find rows in A with no matching row in B.
SELECT a.* FROM a LEFT JOIN b ON a.id = b.a_id WHERE b.a_id IS NULL
Equivalent to NOT IN (SELECT a_id FROM b) but handles NULLs correctly and is typically faster.
Deduplication
Remove duplicate rows, keeping one per key group. ROW_NUMBER() PARTITION BY key ORDER BY tiebreaker, then keep rn = 1. O(n log n) sort inside the window.
Ranking / N-th Highest
DENSE_RANK() assigns the same rank to ties and has no gaps. RANK() has gaps. ROW_NUMBER() never ties.
DENSE_RANK() OVER (ORDER BY salary DESC)
To find the Nth highest: filter WHERE rnk = N.
Running Total / Moving Average
SUM(col) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) — frame accumulates from first row to current.
Changing ROWS to RANGE uses value-based boundaries instead of physical row offsets.
Conditional Aggregation (Pivot)
Collapse category rows into columns by applying SUM(CASE WHEN category = 'X' THEN value END) per group.
SUM(CASE WHEN month = 'Jan' THEN revenue END) AS Jan
No built-in PIVOT in MySQL; this is the standard substitute.
Consecutive Record Detection
Use LAG(col) OVER (ORDER BY ...) to compare each row with the previous one. Rows where col = LAG(col) form a run.
Gap-and-Island
Assign group IDs to consecutive sequences. Compute row_number - dense_rank_of_value (the gap is constant within an island). Then GROUP BY that difference.
Recursive CTE
Traverse hierarchical data (org charts, path finding within a table).
WITH RECURSIVE cte AS (
SELECT id, parent_id, 1 AS depth FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, c.depth + 1 FROM nodes n JOIN cte c ON n.parent_id = c.id
)
Base case + recursive case; MySQL requires RECURSIVE keyword; depth guard prevents infinite loops on cycles.
Top-N Per Group
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)
Wrap in a subquery and filter WHERE rn <= N. Can use RANK() instead if ties should all qualify.
Date Arithmetic
DATEDIFF(end, start) → integer days. DATE_ADD(date, INTERVAL n MONTH). YEAR(d), MONTH(d), DAY(d) extract components. DATE_FORMAT(d, '%Y-%m') formats for grouping by month.
Advanced Techniques
Window Functions with Frame Clauses
What it solves: Rolling aggregates (7-day average, cumulative sum), comparisons to a relative window.
Mechanism: ROWS BETWEEN N PRECEDING AND CURRENT ROW defines a physical frame; RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW defines a logical frame by value.
When to prefer over GROUP BY: When you need per-row results and an aggregate of surrounding rows simultaneously — GROUP BY collapses; window does not.
Complexity: O(n log n) due to sort; frame evaluation is amortised O(1) per row for most aggregates.
Recursive CTEs
What it solves: Hierarchical traversal (org trees, path enumeration), sequence generation, graph traversal within SQL. Mechanism: Anchor query seeds the result; recursive query joins back to the CTE until no new rows are produced. When to prefer over JOIN loop: Only when depth is unbounded or unknown. For fixed depth, chained JOINs are simpler and sometimes faster. Complexity: O(n) rows processed; can be O(n²) if the recursive join scans the full table each iteration without an index.
Conditional Aggregation (Pivot / Unpivot)
What it solves: Rotating category values into columns, computing multiple statistics from one GROUP BY.
Mechanism: SUM(CASE WHEN ...) inside SELECT collapses conditionally; one GROUP BY scan produces all pivoted columns.
When to prefer over multiple queries: Always — one scan is faster than N filtered queries; output is a single result set.
Self-Join for Row Comparison
What it solves: Problems requiring comparison between two rows in the same table (consecutive dates, price changes, manager–subordinate relationships).
Mechanism: Join the table to itself on a relationship predicate (a.date = DATE_SUB(b.date, INTERVAL 1 DAY)).
When to prefer over LAG/LEAD: When the comparison relationship is not strictly sequential (e.g., find all pairs satisfying a condition). For sequential comparisons, LAG/LEAD is cleaner.
EXISTS / NOT EXISTS for Semi-Join / Anti-Join
What it solves: Check existence without returning joined columns; avoids duplicate rows that IN with a multi-valued subquery can cause.
Mechanism: Correlated subquery stops at the first matching row; returns TRUE or FALSE.
When to prefer over IN: When the subquery can return NULLs (NOT IN fails silently with NULLs; NOT EXISTS does not) or when early termination matters for performance.
Problem Taxonomy
By Problem Category
| Problem Type | What it asks | Key Technique |
|---|---|---|
| Rank / N-th value | Find the N-th highest/lowest value | DENSE_RANK() or nested subquery with LIMIT |
| Top-N per group | Best N records per category | ROW_NUMBER() PARTITION BY + filter |
| Find missing / absent | Records not present in another table | Anti-join (LEFT JOIN IS NULL) or NOT EXISTS |
| Consecutive sequence | Rows with consecutive attribute values | LAG/LEAD or self-join on offset |
| Gap-and-island | Group consecutive runs of rows | Row-number minus value-rank grouping key |
| Duplicates | Find or remove repeated rows | GROUP BY HAVING COUNT > 1; ROW_NUMBER() dedup |
| Running total | Cumulative sum over time | SUM() OVER (ORDER BY ...) |
| Moving window | Average/sum over last N days | SUM() OVER (ROWS BETWEEN N PRECEDING AND CURRENT ROW) |
| Pivot / transpose | Categories as columns | Conditional aggregation SUM(CASE WHEN ...) |
| Hierarchical data | Parent-child relationships, tree depth | Recursive CTE |
| Date range overlap | Check if intervals intersect | a.start <= b.end AND a.end >= b.start |
| Percentage / ratio | Share of total | SUM(col) / SUM(SUM(col)) OVER () |
| Median / percentile | Middle value or quantile | PERCENTILE_CONT (not in MySQL); simulate with ROW_NUMBER |
| String manipulation | Parse, match, or format strings | SUBSTRING, REGEXP_LIKE, DATE_FORMAT |
Problem Signal → Technique
| Signal in problem statement | Likely technique |
|---|---|
| "Nth highest" / "second highest" | DENSE_RANK() or LIMIT N-1,1 |
| "for each … the top/best" | ROW_NUMBER() PARTITION BY |
| "customers who never …" / "not in" | Anti-join: LEFT JOIN IS NULL or NOT EXISTS |
| "consecutive days/numbers" | LAG/LEAD or self-join on date ± 1 |
| "at least N times" / "more than once" | GROUP BY HAVING COUNT(...) >= N |
| "cumulative" / "running" | Window SUM OVER (ORDER BY ...) |
| "last 7 days" / "rolling" | Frame: ROWS/RANGE BETWEEN 6 PRECEDING AND CURRENT ROW |
| "for each month" / "by year" | GROUP BY YEAR(d), MONTH(d) or DATE_FORMAT |
| "percentage of total" | SUM(col) * 100.0 / SUM(SUM(col)) OVER () |
| "manager" / "reports to" | Self-join on employee-manager FK |
| "path" / "hierarchy" / "ancestor" | Recursive CTE |
| "pivot" / "each category as a column" | Conditional aggregation |
| "rank with ties" | DENSE_RANK() (no gaps) vs RANK() (gaps) |
| "no duplicate" in result | DISTINCT or GROUP BY |
| "between dates" | WHERE d BETWEEN start AND end (inclusive) |
| "replace NULL with 0" | COALESCE(col, 0) or IFNULL(col, 0) |
Common Patterns
| Pattern | When it applies | Mechanism |
|---|---|---|
| Anti-join | Find rows in A absent from B | LEFT JOIN B ON ... WHERE B.key IS NULL |
| Conditional aggregation | Pivot categories; multi-metric GROUP BY | SUM(CASE WHEN cat='X' THEN val END) |
| Top-N per group | Rank within a partition | ROW_NUMBER() OVER (PARTITION BY g ORDER BY v DESC) then filter |
| Self-join on offset | Compare a row to a neighbour | Join on a.date = DATE_ADD(b.date, INTERVAL 1 DAY) |
| Gap-and-island key | Group consecutive rows | ROW_NUMBER() OVER (ORDER BY x) - DENSE_RANK() OVER (ORDER BY x) |
| Percentage of total | Ratio to group sum | val / SUM(val) OVER (PARTITION BY group) |
| Recursive hierarchy | Tree traversal | Recursive CTE with anchor + recursive branch |
| Date spine join | Fill missing dates | CROSS JOIN a calendar CTE; LEFT JOIN fact data |
| Null-safe anti-membership | Exclude via NOT IN safely | Use NOT EXISTS or LEFT JOIN IS NULL; never NOT IN with nullable column |
| Deduplication with tiebreaker | Keep one row per key | ROW_NUMBER() PARTITION BY key ORDER BY tiebreaker, filter rn = 1 |
Edge Cases & Pitfalls
-
NULL in
NOT IN: If the subquery returns any NULL,x NOT IN (SELECT ...)evaluates to UNKNOWN for all rows, returning zero results. Fix: useNOT EXISTSor filter NULLs from the subquery withWHERE col IS NOT NULL. -
COUNT(*) vs COUNT(col):
COUNT(*)counts all rows including NULLs;COUNT(col)counts non-NULL values only. Mixing them produces wrong aggregates when NULLs are present. -
HAVING instead of WHERE for aggregates:
WHERE AVG(salary) > 50000is a syntax error — WHERE runs before aggregation. Use HAVING. -
SELECT alias in WHERE: MySQL executes WHERE before SELECT, so
SELECT salary * 1.1 AS adjusted WHERE adjusted > 100fails. Repeat the expression or use a subquery/CTE. -
DISTINCT with multiple columns:
SELECT DISTINCT a, bdeduplicates (a, b) pairs, not individual columns. Misunderstanding this produces the wrong row count. -
RANK() gaps vs DENSE_RANK(): Problems asking for the "Nth highest" usually expect DENSE_RANK (no gaps); using RANK skips numbers when there are ties, returning NULL for the Nth rank.
-
JOIN on NULL columns: NULL ≠ NULL in SQL, so a join on a nullable column will never match NULL rows on either side. Use
COALESCE(col, -1)or add an explicit NULL check if NULL matching is intended. -
Integer division: In MySQL,
1 / 2returns0(integer division). Cast:1.0 * col / totalorCAST(col AS DECIMAL). -
UNION silently removing rows:
UNIONdeduplicates. If two rows are legitimately identical and both should appear, useUNION ALL. -
ORDER BY inside window function vs query ORDER BY:
ORDER BYinOVER (...)only orders the window frame; it has no effect on the final result order. Add a separate query-levelORDER BY. -
LAG/LEAD across partition boundaries: By default LAG/LEAD returns NULL at partition edges. Provide a default:
LAG(col, 1, 0) OVER (PARTITION BY ...). -
Date comparison off-by-one:
BETWEENis inclusive on both ends.d BETWEEN '2024-01-01' AND '2024-01-31'includes both endpoints. -
Recursive CTE infinite loop: A cycle in hierarchical data causes infinite recursion. Guard with a depth limit (
WHERE depth < 100) or a visited-set (not natively supported in MySQL; use depth limit).
Implementation Notes
- LeetCode Database problems use MySQL 8.0 by default; window functions, CTEs, and
WITH RECURSIVEare all available. IFNULL(a, b)is MySQL-specific;COALESCE(a, b)is ANSI SQL and portable.- MySQL string comparison is case-insensitive by default (depends on collation); use
BINARYprefix orCOLLATE ... _binfor case-sensitive matching. LIMIT n OFFSET min MySQL paginates;LIMIT m, nis shorthand (reversed argument order is a common mistake).GROUP_CONCAT(col ORDER BY col SEPARATOR ',')concatenates values within a group; the result is truncated at 1024 characters by default (setgroup_concat_max_len).DATE_FORMAT(d, '%Y-%m-%d')is MySQL;TO_CHAR(d, 'YYYY-MM-DD')is PostgreSQL.- Window functions cannot be used directly in WHERE or HAVING; wrap the query in a subquery or CTE first.
EXPLAINreveals query plans; a full table scan (type=ALL) on a large table is a red flag.ROW_NUMBER()is deterministic only ifORDER BYin OVER() is fully deterministic; ties produce arbitrary ordering.
Cross-Topic Relations
| Related Topic | Connection |
|---|---|
| Hash Table | Underlying mechanism for GROUP BY and JOIN hash implementations |
| Sorting | ORDER BY and window function ordering; merge-join requires sorted input |
| Two Pointers | Date-range overlap and interval problems often reduce to two-pointer logic |
| Prefix Sum | Running totals via window SUM are the SQL equivalent of prefix sum arrays |
| Graph Theory | Recursive CTEs implement graph BFS/DFS within a relational context |
| Dynamic Programming | Some multi-step aggregation problems (e.g., stock prices) map to DP recurrences expressible as window frames |
| Sliding Window | Moving-window aggregates (ROWS BETWEEN N PRECEDING) are the SQL form of the sliding window technique |
Interview Reference
Must-Know Problems
Ranking / Top-N
- Second Highest Salary (easy)
- Nth Highest Salary (medium)
- Rank Scores (easy)
- Department Top Three Salaries (hard)
Aggregation / Grouping
- Employee Bonus (easy)
- Find Customer Referee (easy)
- Big Countries (easy)
- Customers Who Bought All Products (medium)
Anti-Join / Missing Records
- Customers Who Never Order (easy)
- Students and Examinations (easy)
- Employees Whose Manager Left (easy)
- Market Analysis I & II (medium / hard)
Consecutive / Gap-and-Island
- Consecutive Numbers (medium)
- Human Traffic of Stadium (hard)
- Find the Start and End Number of Continuous Ranges (medium)
Window Functions
- Running Total for Different Genders (medium)
- Last Person to Fit in the Bus (medium)
- Restaurant Growth (medium)
- Median Employee Salary (hard)
Date / Time
- Game Play Analysis I–IV (easy–hard)
- Active Users (medium)
- New Users Daily Count (medium)
Pivot / Conditional Aggregation
- Reformat Department Table (easy)
- Monthly Transactions I & II (medium / hard)
- Product Sales Analysis III (medium)
Advanced / Hard
- Tournament Winners (hard)
- Report Contiguous Dates (hard)
- Total Sales Amount by Year (hard)
- Average Salary: Departments vs Company (hard)
Additional LeetCode Coverage
- Delete Duplicate Emails (LC 196)
- Game Play Analysis IV (LC 550)
- Managers with at Least 5 Direct Reports (LC 570)
- Investments in 2016 (LC 585)
- Classes With at Least 5 Students (LC 596)
- Friend Requests II: Who Has the Most Friends (LC 602)
- Triangle Judgement (LC 610)
- Biggest Single Number (LC 619)
- Not Boring Movies (LC 620)
- Exchange Seats (LC 626)
- Product Sales Analysis I (LC 1068)
- Project Employees I (LC 1075)
- User Activity for the Past 30 Days I (LC 1141)
- Article Views I (LC 1148)
- Product Price at a Given Date (LC 1164)
- Immediate Food Delivery II (LC 1174)
- Queries Quality and Percentage (LC 1211)
- Average Selling Price (LC 1251)
- List the Products Ordered in a Period (LC 1327)
- Movie Rating (LC 1341)
- Replace Employee ID With The Unique Identifier (LC 1378)
- Group Sold Products By The Date (LC 1484)
- Find Users With Valid E-Mails (LC 1517)
- Patients With a Condition (LC 1527)
- Customer Who Visited but Did Not Make Any Transactions (LC 1581)
- Percentage of Users Attended a Contest (LC 1633)
- Average Time of Process per Machine (LC 1661)
- Fix Names in a Table (LC 1667)
- Invalid Tweets (LC 1683)
- Find Followers Count (LC 1729)
- The Number of Employees Which Report to Each Employee (LC 1731)
- Recyclable and Low Fat Products (LC 1757)
- Primary Department for Each Employee (LC 1789)
- Count Salary Categories (LC 1907)
- Confirmation Rate (LC 1934)
- Employees Whose Manager Left the Company (LC 1978)
- Number of Unique Subjects Taught by Each Teacher (LC 2356)
- Rising Temperature (LC 197)
Clarification Questions to Ask
- Are duplicate rows possible in the input? Should the result deduplicate?
- What should be returned when no rows match (NULL, 0, or empty result set)?
- Is "Nth" 1-indexed? How should ties be handled — include all tied rows or exclude them?
- Is the date column a
DATEtype or a string? Are timezones relevant? - Should
NULLvalues be treated as 0, excluded, or returned as NULL? - Is the comparison case-sensitive for string columns?
- What is "consecutive" — calendar days or business days?
Common Interview Mistakes
- Using
WHEREto filter aggregate results instead ofHAVING. - Writing
NOT IN (subquery)when the subquery can return NULL — returns zero rows silently. - Forgetting
PARTITION BYin a window function, applying the window to the entire table instead of per group. - Using
RANK()for "N-th highest" problems — gaps in rank meanRANK = Nmay not exist; useDENSE_RANK(). - Joining tables and forgetting that
LEFT JOINcan multiply rows when the right table has multiple matches. - Confusing the
LIMIT m, nargument order (MySQL shorthand: first is offset, second is count). - Not handling the case where the result should return NULL vs. no rows for unanswered queries.
- Using integer division without casting, producing truncated percentages.
Typical Follow-Up Escalations
- "Now return NULL instead of no row when no match exists" → Wrap in a subquery and LEFT JOIN against a constant or use a CTE with UNION.
- "Now solve it without window functions" → Use correlated subqueries or self-joins.
- "Now make it work for the top N per group, not just top 1" → Switch from
MAX()/MIN()toROW_NUMBER() PARTITION BY. - "What if there are ties?" → Replace
ROW_NUMBER()withDENSE_RANK(); adjust filter to<= N. - "How would this perform on 100M rows?" → Discuss indexes on JOIN/WHERE/ORDER BY columns, partitioning, and avoiding full scans.
- "Rewrite this without a subquery" → Convert to CTE or JOIN-based formulation.
Read Next
Start reading to get personalized recommendations
Explore Unread
Great job! You've read all available articles