Database (SQL)

Type: Technique-Based — SQL is a declarative querying technique applied to relational data. LeetCode has 300+ problems under this tag; all variants, advanced techniques, and full edge-case coverage are required.

Core Concepts

Relation (Table) — An unordered set of rows sharing the same named columns; "unordered" means no row position is guaranteed without ORDER BY.

Tuple (Row/Record) — A single entry in a table; uniquely identified by the primary key.

Attribute (Column/Field) — A named, typed slot in every row; every cell can hold NULL.

NULL — Absence of a value, not zero or empty string; any arithmetic or comparison with NULL yields NULL (three-valued logic: TRUE / FALSE / UNKNOWN).

Primary Key — One or more columns whose combined values uniquely identify each row; implicitly NOT NULL.

Foreign Key — A column whose values must match a primary key in another table (referential integrity); may be NULL if the relationship is optional.

Aggregate Function — Collapses many rows into one value: COUNT, SUM, AVG, MIN, MAX; ignores NULLs except COUNT(*).

Window Function — Computes a value for each row using a sliding frame of related rows, without collapsing them the way GROUP BY does.

Predicate — A boolean-valued expression used in WHERE or HAVING; evaluates to TRUE, FALSE, or UNKNOWN.

Cardinality — The number of distinct values in a column; high cardinality = many unique values.

Execution Order — The logical sequence in which clauses are evaluated (see Structural Invariants); determines what aliases and results are visible where.

Structural Invariants

SQL has a fixed logical evaluation order. Violating it (e.g., referencing a SELECT alias in WHERE) causes a syntax or runtime error.

Step	Clause	What is visible at this step	Common violation
1	`FROM` / `JOIN`	Raw tables and join results	Trying to filter before the join materialises
2	`WHERE`	Columns from FROM result; NOT SELECT aliases	Using SELECT alias in WHERE → error
3	`GROUP BY`	Columns from WHERE result	Grouping on a derived alias not yet computed
4	`HAVING`	Aggregate results and GROUP BY keys	Filtering groups using WHERE instead of HAVING
5	`SELECT`	All computed expressions, aliases assigned here	Using window function output in WHERE
6	`DISTINCT`	SELECT output
7	`ORDER BY`	SELECT output; aliases ARE visible here
8	`LIMIT` / `OFFSET`	Final ordered result

Key invariant: SELECT aliases do not exist until step 5, so they cannot be used in WHERE, GROUP BY, or HAVING (except MySQL's non-standard GROUP BY alias extension).

Types / Variants

JOIN Types

Type	Rows returned	When to use
`INNER JOIN`	Only rows with matching keys in both tables	Default; drops unmatched rows
`LEFT JOIN`	All rows from left + matched right; NULLs where no match	Preserve left table; detect missing right rows
`RIGHT JOIN`	All rows from right + matched left; NULLs where no match	Rarely used; rewrite as LEFT JOIN instead
`FULL OUTER JOIN`	All rows from both, NULLs on either side	Find all unmatched on both sides
`CROSS JOIN`	Cartesian product (m × n rows)	Generate all combinations; date/value grids
Self-join	A table joined to itself using aliases	Compare rows within the same table

Set Operations

Operation	Semantics	Duplicate handling
`UNION`	Rows in A or B	Removes duplicates (like DISTINCT)
`UNION ALL`	Rows in A or B	Keeps all duplicates; faster
`INTERSECT`	Rows in both A and B	Removes duplicates
`EXCEPT` / `MINUS`	Rows in A but not B	Removes duplicates

Column count and types must match across all branches.

Subquery Forms

Form	Returns	Correlated?	When to use
Scalar subquery	Single value	Either	Use in SELECT or WHERE comparisons
Row subquery	Single row	Either	Compare `(a, b) = (SELECT ...)`
Table subquery (derived table)	Result set	No	Use in FROM as `(SELECT ...) AS alias`
Correlated subquery	Depends on outer row	Yes	Row-by-row checks; often rewritable as JOIN
`EXISTS` / `NOT EXISTS`	Boolean	Yes	Semi-join / anti-join; stops at first match

Key Algorithms

Filtering with WHERE vs HAVING

WHERE filters individual rows before grouping; HAVING filters groups after aggregation. Use WHERE when the condition is on a non-aggregated column; use HAVING when it depends on an aggregate value.

Anti-Join (LEFT JOIN + IS NULL)

Find rows in A with no matching row in B.

SELECT a.* FROM a LEFT JOIN b ON a.id = b.a_id WHERE b.a_id IS NULL

Equivalent to NOT IN (SELECT a_id FROM b) but handles NULLs correctly and is typically faster.

Deduplication

Remove duplicate rows, keeping one per key group. ROW_NUMBER() PARTITION BY key ORDER BY tiebreaker, then keep rn = 1. O(n log n) sort inside the window.

Ranking / N-th Highest

DENSE_RANK() assigns the same rank to ties and has no gaps. RANK() has gaps. ROW_NUMBER() never ties.

DENSE_RANK() OVER (ORDER BY salary DESC)

To find the Nth highest: filter WHERE rnk = N.

Running Total / Moving Average

SUM(col) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) — frame accumulates from first row to current. Changing ROWS to RANGE uses value-based boundaries instead of physical row offsets.

Conditional Aggregation (Pivot)

Collapse category rows into columns by applying SUM(CASE WHEN category = 'X' THEN value END) per group.

SUM(CASE WHEN month = 'Jan' THEN revenue END) AS Jan

No built-in PIVOT in MySQL; this is the standard substitute.

Consecutive Record Detection

Use LAG(col) OVER (ORDER BY ...) to compare each row with the previous one. Rows where col = LAG(col) form a run.

Gap-and-Island

Assign group IDs to consecutive sequences. Compute row_number - dense_rank_of_value (the gap is constant within an island). Then GROUP BY that difference.

Recursive CTE

Traverse hierarchical data (org charts, path finding within a table).

WITH RECURSIVE cte AS (
  SELECT id, parent_id, 1 AS depth FROM nodes WHERE parent_id IS NULL
  UNION ALL
  SELECT n.id, n.parent_id, c.depth + 1 FROM nodes n JOIN cte c ON n.parent_id = c.id
)

Base case + recursive case; MySQL requires RECURSIVE keyword; depth guard prevents infinite loops on cycles.

Top-N Per Group

ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)

Wrap in a subquery and filter WHERE rn <= N. Can use RANK() instead if ties should all qualify.

Date Arithmetic

DATEDIFF(end, start) → integer days. DATE_ADD(date, INTERVAL n MONTH). YEAR(d), MONTH(d), DAY(d) extract components. DATE_FORMAT(d, '%Y-%m') formats for grouping by month.

Advanced Techniques

Window Functions with Frame Clauses

What it solves: Rolling aggregates (7-day average, cumulative sum), comparisons to a relative window. Mechanism: ROWS BETWEEN N PRECEDING AND CURRENT ROW defines a physical frame; RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW defines a logical frame by value. When to prefer over GROUP BY: When you need per-row results and an aggregate of surrounding rows simultaneously — GROUP BY collapses; window does not. Complexity: O(n log n) due to sort; frame evaluation is amortised O(1) per row for most aggregates.

Recursive CTEs

What it solves: Hierarchical traversal (org trees, path enumeration), sequence generation, graph traversal within SQL. Mechanism: Anchor query seeds the result; recursive query joins back to the CTE until no new rows are produced. When to prefer over JOIN loop: Only when depth is unbounded or unknown. For fixed depth, chained JOINs are simpler and sometimes faster. Complexity: O(n) rows processed; can be O(n²) if the recursive join scans the full table each iteration without an index.

Conditional Aggregation (Pivot / Unpivot)

What it solves: Rotating category values into columns, computing multiple statistics from one GROUP BY. Mechanism: SUM(CASE WHEN ...) inside SELECT collapses conditionally; one GROUP BY scan produces all pivoted columns. When to prefer over multiple queries: Always — one scan is faster than N filtered queries; output is a single result set.

Self-Join for Row Comparison

What it solves: Problems requiring comparison between two rows in the same table (consecutive dates, price changes, manager–subordinate relationships). Mechanism: Join the table to itself on a relationship predicate (a.date = DATE_SUB(b.date, INTERVAL 1 DAY)). When to prefer over LAG/LEAD: When the comparison relationship is not strictly sequential (e.g., find all pairs satisfying a condition). For sequential comparisons, LAG/LEAD is cleaner.

EXISTS / NOT EXISTS for Semi-Join / Anti-Join

What it solves: Check existence without returning joined columns; avoids duplicate rows that IN with a multi-valued subquery can cause. Mechanism: Correlated subquery stops at the first matching row; returns TRUE or FALSE. When to prefer over IN: When the subquery can return NULLs (NOT IN fails silently with NULLs; NOT EXISTS does not) or when early termination matters for performance.

Problem Taxonomy

By Problem Category

Problem Type	What it asks	Key Technique
Rank / N-th value	Find the N-th highest/lowest value	`DENSE_RANK()` or nested subquery with `LIMIT`
Top-N per group	Best N records per category	`ROW_NUMBER() PARTITION BY` + filter
Find missing / absent	Records not present in another table	Anti-join (`LEFT JOIN IS NULL`) or `NOT EXISTS`
Consecutive sequence	Rows with consecutive attribute values	`LAG`/`LEAD` or self-join on offset
Gap-and-island	Group consecutive runs of rows	Row-number minus value-rank grouping key
Duplicates	Find or remove repeated rows	`GROUP BY HAVING COUNT > 1`; `ROW_NUMBER()` dedup
Running total	Cumulative sum over time	`SUM() OVER (ORDER BY ...)`
Moving window	Average/sum over last N days	`SUM() OVER (ROWS BETWEEN N PRECEDING AND CURRENT ROW)`
Pivot / transpose	Categories as columns	Conditional aggregation `SUM(CASE WHEN ...)`
Hierarchical data	Parent-child relationships, tree depth	Recursive CTE
Date range overlap	Check if intervals intersect	`a.start <= b.end AND a.end >= b.start`
Percentage / ratio	Share of total	`SUM(col) / SUM(SUM(col)) OVER ()`
Median / percentile	Middle value or quantile	`PERCENTILE_CONT` (not in MySQL); simulate with ROW_NUMBER
String manipulation	Parse, match, or format strings	`SUBSTRING`, `REGEXP_LIKE`, `DATE_FORMAT`

Problem Signal → Technique

Signal in problem statement	Likely technique
"Nth highest" / "second highest"	`DENSE_RANK()` or `LIMIT N-1,1`
"for each … the top/best"	`ROW_NUMBER() PARTITION BY`
"customers who never …" / "not in"	Anti-join: `LEFT JOIN IS NULL` or `NOT EXISTS`
"consecutive days/numbers"	`LAG`/`LEAD` or self-join on date ± 1
"at least N times" / "more than once"	`GROUP BY HAVING COUNT(...) >= N`
"cumulative" / "running"	Window `SUM OVER (ORDER BY ...)`
"last 7 days" / "rolling"	Frame: `ROWS/RANGE BETWEEN 6 PRECEDING AND CURRENT ROW`
"for each month" / "by year"	`GROUP BY YEAR(d), MONTH(d)` or `DATE_FORMAT`
"percentage of total"	`SUM(col) * 100.0 / SUM(SUM(col)) OVER ()`
"manager" / "reports to"	Self-join on employee-manager FK
"path" / "hierarchy" / "ancestor"	Recursive CTE
"pivot" / "each category as a column"	Conditional aggregation
"rank with ties"	`DENSE_RANK()` (no gaps) vs `RANK()` (gaps)
"no duplicate" in result	`DISTINCT` or `GROUP BY`
"between dates"	`WHERE d BETWEEN start AND end` (inclusive)
"replace NULL with 0"	`COALESCE(col, 0)` or `IFNULL(col, 0)`

Common Patterns

Pattern	When it applies	Mechanism
Anti-join	Find rows in A absent from B	`LEFT JOIN B ON ... WHERE B.key IS NULL`
Conditional aggregation	Pivot categories; multi-metric GROUP BY	`SUM(CASE WHEN cat='X' THEN val END)`
Top-N per group	Rank within a partition	`ROW_NUMBER() OVER (PARTITION BY g ORDER BY v DESC)` then filter
Self-join on offset	Compare a row to a neighbour	Join on `a.date = DATE_ADD(b.date, INTERVAL 1 DAY)`
Gap-and-island key	Group consecutive rows	`ROW_NUMBER() OVER (ORDER BY x) - DENSE_RANK() OVER (ORDER BY x)`
Percentage of total	Ratio to group sum	`val / SUM(val) OVER (PARTITION BY group)`
Recursive hierarchy	Tree traversal	Recursive CTE with anchor + recursive branch
Date spine join	Fill missing dates	CROSS JOIN a calendar CTE; LEFT JOIN fact data
Null-safe anti-membership	Exclude via NOT IN safely	Use `NOT EXISTS` or `LEFT JOIN IS NULL`; never `NOT IN` with nullable column
Deduplication with tiebreaker	Keep one row per key	`ROW_NUMBER() PARTITION BY key ORDER BY tiebreaker`, filter `rn = 1`

Edge Cases & Pitfalls

NULL in NOT IN: If the subquery returns any NULL, x NOT IN (SELECT ...) evaluates to UNKNOWN for all rows, returning zero results. Fix: use NOT EXISTS or filter NULLs from the subquery with WHERE col IS NOT NULL.
COUNT(*) vs COUNT(col): COUNT(*) counts all rows including NULLs; COUNT(col) counts non-NULL values only. Mixing them produces wrong aggregates when NULLs are present.
HAVING instead of WHERE for aggregates: WHERE AVG(salary) > 50000 is a syntax error — WHERE runs before aggregation. Use HAVING.
SELECT alias in WHERE: MySQL executes WHERE before SELECT, so SELECT salary * 1.1 AS adjusted WHERE adjusted > 100 fails. Repeat the expression or use a subquery/CTE.
DISTINCT with multiple columns: SELECT DISTINCT a, b deduplicates (a, b) pairs, not individual columns. Misunderstanding this produces the wrong row count.
RANK() gaps vs DENSE_RANK(): Problems asking for the "Nth highest" usually expect DENSE_RANK (no gaps); using RANK skips numbers when there are ties, returning NULL for the Nth rank.
JOIN on NULL columns: NULL ≠ NULL in SQL, so a join on a nullable column will never match NULL rows on either side. Use COALESCE(col, -1) or add an explicit NULL check if NULL matching is intended.
Integer division: In MySQL, 1 / 2 returns 0 (integer division). Cast: 1.0 * col / total or CAST(col AS DECIMAL).
UNION silently removing rows: UNION deduplicates. If two rows are legitimately identical and both should appear, use UNION ALL.
ORDER BY inside window function vs query ORDER BY: ORDER BY in OVER (...) only orders the window frame; it has no effect on the final result order. Add a separate query-level ORDER BY.
LAG/LEAD across partition boundaries: By default LAG/LEAD returns NULL at partition edges. Provide a default: LAG(col, 1, 0) OVER (PARTITION BY ...).
Date comparison off-by-one: BETWEEN is inclusive on both ends. d BETWEEN '2024-01-01' AND '2024-01-31' includes both endpoints.
Recursive CTE infinite loop: A cycle in hierarchical data causes infinite recursion. Guard with a depth limit (WHERE depth < 100) or a visited-set (not natively supported in MySQL; use depth limit).

Implementation Notes

LeetCode Database problems use MySQL 8.0 by default; window functions, CTEs, and WITH RECURSIVE are all available.
IFNULL(a, b) is MySQL-specific; COALESCE(a, b) is ANSI SQL and portable.
MySQL string comparison is case-insensitive by default (depends on collation); use BINARY prefix or COLLATE ... _bin for case-sensitive matching.
LIMIT n OFFSET m in MySQL paginates; LIMIT m, n is shorthand (reversed argument order is a common mistake).
GROUP_CONCAT(col ORDER BY col SEPARATOR ',') concatenates values within a group; the result is truncated at 1024 characters by default (set group_concat_max_len).
DATE_FORMAT(d, '%Y-%m-%d') is MySQL; TO_CHAR(d, 'YYYY-MM-DD') is PostgreSQL.
Window functions cannot be used directly in WHERE or HAVING; wrap the query in a subquery or CTE first.
EXPLAIN reveals query plans; a full table scan (type=ALL) on a large table is a red flag.
ROW_NUMBER() is deterministic only if ORDER BY in OVER() is fully deterministic; ties produce arbitrary ordering.

Cross-Topic Relations

Related Topic	Connection
Hash Table	Underlying mechanism for GROUP BY and JOIN hash implementations
Sorting	ORDER BY and window function ordering; merge-join requires sorted input
Two Pointers	Date-range overlap and interval problems often reduce to two-pointer logic
Prefix Sum	Running totals via window SUM are the SQL equivalent of prefix sum arrays
Graph Theory	Recursive CTEs implement graph BFS/DFS within a relational context
Dynamic Programming	Some multi-step aggregation problems (e.g., stock prices) map to DP recurrences expressible as window frames
Sliding Window	Moving-window aggregates (`ROWS BETWEEN N PRECEDING`) are the SQL form of the sliding window technique

Interview Reference

Must-Know Problems

Ranking / Top-N

Second Highest Salary (easy)
Nth Highest Salary (medium)
Rank Scores (easy)
Department Top Three Salaries (hard)

Aggregation / Grouping

Employee Bonus (easy)
Find Customer Referee (easy)
Big Countries (easy)
Customers Who Bought All Products (medium)

Anti-Join / Missing Records

Customers Who Never Order (easy)
Students and Examinations (easy)
Employees Whose Manager Left (easy)
Market Analysis I & II (medium / hard)

Consecutive / Gap-and-Island

Consecutive Numbers (medium)
Human Traffic of Stadium (hard)
Find the Start and End Number of Continuous Ranges (medium)

Window Functions

Running Total for Different Genders (medium)
Last Person to Fit in the Bus (medium)
Restaurant Growth (medium)
Median Employee Salary (hard)

Date / Time

Game Play Analysis I–IV (easy–hard)
Active Users (medium)
New Users Daily Count (medium)

Pivot / Conditional Aggregation

Reformat Department Table (easy)
Monthly Transactions I & II (medium / hard)
Product Sales Analysis III (medium)

Advanced / Hard

Tournament Winners (hard)
Report Contiguous Dates (hard)
Total Sales Amount by Year (hard)
Average Salary: Departments vs Company (hard)

Additional LeetCode Coverage

Delete Duplicate Emails (LC 196)
Game Play Analysis IV (LC 550)
Managers with at Least 5 Direct Reports (LC 570)
Investments in 2016 (LC 585)
Classes With at Least 5 Students (LC 596)
Friend Requests II: Who Has the Most Friends (LC 602)
Triangle Judgement (LC 610)
Biggest Single Number (LC 619)
Not Boring Movies (LC 620)
Exchange Seats (LC 626)
Product Sales Analysis I (LC 1068)
Project Employees I (LC 1075)
User Activity for the Past 30 Days I (LC 1141)
Article Views I (LC 1148)
Product Price at a Given Date (LC 1164)
Immediate Food Delivery II (LC 1174)
Queries Quality and Percentage (LC 1211)
Average Selling Price (LC 1251)
List the Products Ordered in a Period (LC 1327)
Movie Rating (LC 1341)
Replace Employee ID With The Unique Identifier (LC 1378)
Group Sold Products By The Date (LC 1484)
Find Users With Valid E-Mails (LC 1517)
Patients With a Condition (LC 1527)
Customer Who Visited but Did Not Make Any Transactions (LC 1581)
Percentage of Users Attended a Contest (LC 1633)
Average Time of Process per Machine (LC 1661)
Fix Names in a Table (LC 1667)
Invalid Tweets (LC 1683)
Find Followers Count (LC 1729)
The Number of Employees Which Report to Each Employee (LC 1731)
Recyclable and Low Fat Products (LC 1757)
Primary Department for Each Employee (LC 1789)
Count Salary Categories (LC 1907)
Confirmation Rate (LC 1934)
Employees Whose Manager Left the Company (LC 1978)
Number of Unique Subjects Taught by Each Teacher (LC 2356)
Rising Temperature (LC 197)

Clarification Questions to Ask

Are duplicate rows possible in the input? Should the result deduplicate?
What should be returned when no rows match (NULL, 0, or empty result set)?
Is "Nth" 1-indexed? How should ties be handled — include all tied rows or exclude them?
Is the date column a DATE type or a string? Are timezones relevant?
Should NULL values be treated as 0, excluded, or returned as NULL?
Is the comparison case-sensitive for string columns?
What is "consecutive" — calendar days or business days?

Common Interview Mistakes

Using WHERE to filter aggregate results instead of HAVING.
Writing NOT IN (subquery) when the subquery can return NULL — returns zero rows silently.
Forgetting PARTITION BY in a window function, applying the window to the entire table instead of per group.
Using RANK() for "N-th highest" problems — gaps in rank mean RANK = N may not exist; use DENSE_RANK().
Joining tables and forgetting that LEFT JOIN can multiply rows when the right table has multiple matches.
Confusing the LIMIT m, n argument order (MySQL shorthand: first is offset, second is count).
Not handling the case where the result should return NULL vs. no rows for unanswered queries.
Using integer division without casting, producing truncated percentages.

Typical Follow-Up Escalations

"Now return NULL instead of no row when no match exists" → Wrap in a subquery and LEFT JOIN against a constant or use a CTE with UNION.
"Now solve it without window functions" → Use correlated subqueries or self-joins.
"Now make it work for the top N per group, not just top 1" → Switch from MAX()/MIN() to ROW_NUMBER() PARTITION BY.
"What if there are ties?" → Replace ROW_NUMBER() with DENSE_RANK(); adjust filter to <= N.
"How would this perform on 100M rows?" → Discuss indexes on JOIN/WHERE/ORDER BY columns, partitioning, and avoiding full scans.
"Rewrite this without a subquery" → Convert to CTE or JOIN-based formulation.

Explore Unread

Great job! You've read all available articles