SQL Interview Practice Guide
SQL Interview Practice Guide
The journey into SQL begins with retrieving data. The most fundamental statement for
this task is composed of three clauses: SELECT, FROM, and WHERE.
At its heart, every data retrieval query must answer two questions: what data do you
want, and where is it located? The SELECT and FROM clauses provide these answers.
● SELECT: This clause specifies the columns (or fields) you wish to retrieve from
the database. You can list one or more column names, separated by commas.4
● FROM: This clause specifies the table from which you are retrieving the data.4
These two clauses are the mandatory ingredients of any data retrieval query, and they
must appear in this specific order: SELECT followed by FROM.4
A simple query to retrieve the names and ages of all customers would look like this:
SQL
SELECTname, age
FROM Customers;
To select every column in a table without listing them individually, you can use the
asterisk (*) wildcard.4
SQL
SELECT *
FROM Customers;
While convenient for exploration, using SELECT * in production code or final interview
answers is generally discouraged. It can be inefficient as it may retrieve more data
than necessary and can make queries less readable and more prone to breaking if the
underlying table structure changes. Explicitly naming the required columns is best
practice.5
Rarely do you need every single row from a table. The WHERE clause is the primary
tool for filtering rows based on specific conditions, ensuring you only retrieve the data
that meets your criteria.1 It is applied to individual rows before any grouping or
aggregation occurs.7
The WHERE clause supports a wide array of operators to build precise conditions:
● Comparison Operators: These are used for comparing values. They include =, !=
or <> (not equal to), > (greater than), < (less than), >= (greater than or equal to),
and <= (less than or equal to).9
○ Example: Retrieve products with a price greater than 50.00.
SQL
SELECTproduct_name, price
FROM Products
WHERE price > 50.00;
● Logical Operators: AND, OR, and NOT are used to combine multiple conditions.1
○ AND: All conditions must be true.
○ OR: At least one of the conditions must be true.
○ NOT: Reverses the result of a condition.
○ Example: Retrieve American cities with a population over 100,000.
SQL
SELECT *
FROM CITY
WHERE COUNTRYCODE = 'USA' AND POPULATION > 100000; -- [10]
● Pattern Matching with LIKE: This operator is used for searching for a specified
pattern in a column. It is particularly useful for string data and employs two
wildcards 12:
○ %: Represents zero, one, or multiple characters.
○ _: Represents a single character.
○ Example: Find all users whose email address is from 'example.com'.
SQL
SELECTfirst_name, last_name, email
FROM users
WHERE email LIKE '%@example.com'; -- [13]
SQL
SELECTname
FROM Customer
WHERE referee_id IS NULL; -- [15]
This is a common stumbling block for beginners and a frequent topic in interview
questions. Using WHERE referee_id = NULL will not return the expected results.
The WHERE clause functions as the first critical gatekeeper in the query execution
process. The database engine first identifies the table in the FROM clause and then
evaluates each row against the conditions specified in the WHERE clause. Only the
rows that satisfy these conditions are passed on to the subsequent stages of the
query, such as grouping, aggregation, or ordering.7 This row-by-row filtering
mechanism is precisely why aggregate functions like
SUM() or COUNT() cannot be used in the WHERE clause; at this stage, the data has
not yet been grouped to perform such calculations. This principle is a cornerstone for
understanding the difference between WHERE and the HAVING clause, which will be
discussed later.
Once you have selected and filtered your data, the next step is to present it in a clear
and meaningful way. The ORDER BY, DISTINCT, and alias (AS) clauses are essential for
this purpose.
The ORDER BY clause is used to sort the result set in ascending or descending order
based on one or more columns.16
● ASC: Sorts in ascending order (A-Z, 0-9). This is the default behavior if no order is
specified.16
● DESC: Sorts in descending order (Z-A, 9-0).
You can sort by multiple columns. The query will sort by the first column specified,
and then for any rows with the same value in the first column, it will sort them by the
second column, and so on.
Example: Show sales ordered by the most recent date first, and then alphabetically
by salesperson for sales on the same date.
SQL
SELECT *
FROM SALES
ORDER BY SaleDate DESC, Salesperson ASC; -- [16]
The DISTINCT keyword is used within the SELECT clause to return only unique
(different) values from a column or a combination of columns.3
Example: Get a list of unique cities from the STATION table that have an even ID
number.
SQL
In this query, if the city 'New York' appears multiple times for different even-ID
stations, it will only be listed once in the final result.
Aliases provide a temporary, more readable name for columns or tables within a
query. They are defined using the AS keyword (though in many SQL dialects, AS is
optional).1 Using aliases is not merely a convenience; it is a hallmark of clean,
professional, and maintainable code—a quality highly valued in any technical role.18
● Column Aliases: Used to make output headers more descriptive, especially for
calculated columns.
○ Example:
SQL
SELECT AVG(price) AS average_price
FROM Products;
● Table Aliases: Used to shorten table names in complex queries, especially those
involving JOINs, making the query easier to write and read.
○ Example:
SQL
SELECT c.first_name, o.order_id
FROM Customers AS c
JOIN Orders AS o ON c.customer_id = o.customer_id; -- [17]
A subtle but important point that distinguishes proficient SQL users is understanding
the order of query execution. The logical processing order is generally: FROM ->
WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY.19 Because the
SELECT clause (where column aliases are defined) is processed before the ORDER BY
clause, you can use a column alias in your ORDER BY statement.
Example:
SQL
Here, total_gain is a valid reference in the ORDER BY clause. However, you could not
use this alias in the WHERE or HAVING clause, as they are processed before the
SELECT statement. This understanding demonstrates a deeper grasp of SQL
mechanics beyond simple syntax memorization.
While the majority of interview questions focus on data retrieval (SELECT queries,
which belong to the Data Query Language or DQL), a well-rounded data professional
must also be familiar with commands that define and manipulate the database
structure and its data.1 These commands are categorized into Data Definition
Language (DDL) and Data Manipulation Language (DML).
DML Commands
DML commands are used to manage data within existing table objects.1
● INSERT INTO: Adds new rows of data to a table. You can insert a single row or
multiple rows at once.1
● UPDATE: Modifies existing records in a table.1
● DELETE: Removes existing rows from a table.1
Example: Increase the salary by 10% for all employees in the "Sales" department.
SQL
UPDATE employees
SET salary = salary * 1.10
WHERE department = 'Sales'; -- [13]
DDL Commands
DDL commands are used to define, modify, and remove database objects themselves,
such as tables.1
● CREATE TABLE: Creates a new table in the database.1
● ALTER TABLE: Modifies the structure of an existing table (e.g., adding or dropping
a column).1
● DROP TABLE: Permanently removes an entire table, including its structure and all
data.1
● TRUNCATE TABLE: Removes all rows from a table quickly, but keeps the table
structure intact.1
This is a classic interview question that separates candidates who know syntax from
those who understand database mechanics. The question is a proxy for assessing a
candidate's understanding of deeper concepts like transactional control, logging, and
performance.
DELETE is a fully logged operation for each row removed. This makes it recoverable
within a transaction but potentially very slow and resource-intensive for large tables.
In contrast, TRUNCATE is minimally logged (it only logs the deallocation of the data
pages), which makes it extremely fast but also much harder, if not impossible, to roll
back. This shows an understanding of the transaction log, a critical component of any
relational database, and the crucial trade-off between safety (DELETE) and
performance (TRUNCATE). This level of analysis signals that the candidate thinks
about the real-world consequences of their commands, a vital skill for anyone working
with production data systems.
This section delves into the core of data analysis with SQL. The ability to summarize,
group, and filter aggregated data is a non-negotiable skill for any data analyst,
scientist, or business intelligence professional. Mastery of these concepts is essential
for transforming raw data into meaningful insights.
Example: Find the difference between the maximum and minimum populations in the
CITY table.
SQL
SELECT MAX(POPULATION) - MIN(POPULATION)
FROM CITY; -- [20]
A frequent point of confusion, and thus a common interview topic, is the subtle
difference in how COUNT() behaves depending on its argument. Precision here
demonstrates a solid understanding.
● COUNT(*): Counts the total number of rows in the result set, including rows with
NULL values. It is generally the most performant way to get a total row count.8
● COUNT(column_name): Counts the number of rows where column_name is not
NULL. It ignores any rows where the specified column has a NULL value.
● COUNT(DISTINCT column_name): Counts the number of unique, non-NULL
values in column_name.
Example: Find the difference between the total number of city entries and the
number of distinct city entries. This tells you how many city names are duplicated in
the table.
SQL
Aggregate functions are most powerful when combined with the GROUP BY clause,
which allows you to perform calculations on subsets of your data.
Grouping with GROUP BY
The GROUP BY statement groups rows that have the same values in specified columns
into summary rows.8 It is used with aggregate functions to perform calculations on
each group. For instance, you can calculate the total sales
You can group by a single column or multiple columns. When grouping by multiple
columns, a unique group is formed for each distinct combination of values in those
columns.21
SQL
Example: Count the number of students for each subject and year.
SQL
After grouping your data, you might want to filter the groups themselves. For example,
you might want to see only the departments where the average salary exceeds
$50,000. This is where the HAVING clause comes in. It is specifically designed to filter
groups based on the results of aggregate functions.22
SQL
One of the most classic and revealing SQL interview questions is to explain the
difference between the WHERE and HAVING clauses. A clear answer demonstrates a
fundamental understanding of how SQL processes queries.
● WHERE filters rows before they are grouped. It operates on the raw data from
the tables.7
● HAVING filters groups after they have been created by GROUP BY. It
operates on the summarized result set.7
This operational difference is why you cannot use an aggregate function like SUM() in
a WHERE clause, but you must use one in a HAVING clause.19
Example: Find the total sales for each region, but only consider individual sales
greater than $500, and only show regions with total sales over $10,000.
SQL
Here, WHERE first filters out individual sales of $500 or less. Then, GROUP BY
calculates the total sales for the remaining transactions in each region. Finally,
HAVING filters out the regions where the calculated total is not over $10,000.
The key to truly internalizing the WHERE vs. HAVING distinction—and to debugging
many other common SQL errors—is to understand the logical order in which the
database engine processes a query. While you write a query in a certain order
(SELECT, FROM, WHERE...), the database executes it in a different, logical order.19
This section addresses the skills that separate proficient SQL users from beginners. In
any real-world database, data is strategically distributed across multiple tables to
ensure integrity and reduce redundancy—a concept known as normalization. The
ability to combine, structure, and analyze data from these disparate tables is
therefore paramount. Mastery of JOINs, subqueries, CTEs, and window functions is
essential for tackling complex analytical tasks.
JOIN clauses are the fundamental mechanism for combining rows from two or more
tables based on a related column between them.17 Without
Types of JOINs
Understanding the different types of JOINs and when to use each is critical. The type
of join determines which rows from each table are included in the final result set.
● INNER JOIN: This is the most common type of join. It returns only the rows where
the join condition is met in both tables. If a row in one table does not have a
matching row in the other, it is excluded from the result.6
● LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table (the table
specified after FROM) and the matched rows from the right table (the table
specified after JOIN). If there is no match for a row from the left table, the
columns from the right table will contain NULL values.6 The
OUTER keyword is optional and does not change the behavior; LEFT JOIN and
LEFT OUTER JOIN are identical.3
● RIGHT JOIN (or RIGHT OUTER JOIN): This is the inverse of a LEFT JOIN. It
returns all rows from the right table and the matched rows from the left table. If
there is no match, columns from the left table will be NULL.6
● FULL OUTER JOIN: This join combines the behavior of LEFT and RIGHT joins. It
returns all rows when there is a match in either the left or the right table. It
effectively returns all rows from both tables, filling in NULLs where matches do
not exist on either side.6
● CROSS JOIN: This join returns the Cartesian product of the two tables. Every row
from the first table is combined with every row from the second table. It is used
less frequently and typically does not have an ON clause.26
● SELF JOIN: This is not a distinct join type but a technique where a table is joined
to itself. It is essential for querying hierarchical data (e.g., finding an employee's
manager within the same Employees table) or for making comparisons between
rows within the same table.5 A table alias is required for a self join.
Visualizing how each join type includes or excludes data is crucial for intuitive
understanding. The following table provides a clear, side-by-side comparison.
Customers Table
CustomerID Name
1 Alice
2 Bob
3 Charlie
Orders Table
101 1 50
102 1 75
103 2 120
104 4 200
As queries become more complex, nesting logic within a single, monolithic block
becomes unreadable and difficult to maintain. Subqueries and Common Table
Expressions (CTEs) are two powerful constructs for structuring complex logic into
manageable, modular parts.
A subquery is a SELECT statement that is nested inside another SQL statement (the
outer query).1 The result of the inner query is used by the outer query.
● Types of Subqueries:
○ Single-row subquery: Returns only one row. Can be used with standard
comparison operators (=, >, etc.).
○ Multi-row subquery: Returns multiple rows. Must be used with operators like
IN, NOT IN, ANY, or ALL.11
○ Correlated subquery: This is an advanced and powerful type where the inner
query depends on the outer query for its values. The subquery is re-evaluated
for each row processed by the outer query.11 While potent, correlated
subqueries can be slow and should be used with caution, as they can lead to
performance issues if not written carefully.28
Example: Find all employees whose salary is greater than the average salary in their
respective department (a correlated subquery).
SQL
A Common Table Expression (CTE) is a temporary, named result set that you can
reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs are defined
using the WITH clause and are often preferred over subqueries for their significant
improvements in readability and modularity.30
Syntax:
SQL
WITH cte_name AS (
-- CTE query definition
SELECT...
)
-- Main query that references the CTE
SELECT * FROM cte_name; -- [11]
Advantages of CTEs:
● Readability: They break down complex queries into logical, sequential steps,
making the query easier to read and understand.30
● Reusability: A CTE can be referenced multiple times within the main query,
avoiding the need to rewrite the same subquery repeatedly.30
● Recursion: CTEs can reference themselves, a feature known as recursion. This is
the standard and most effective way to query hierarchical data, such as
organizational charts or parts explosions.32
SQL
WITH Supervisors AS (
SELECT e.employee_id, e.employee_name, s.employee_name AS supervisor_name
FROM employees e
LEFT JOIN employees s ON e.supervisor_id = s.employee_id
)
SELECTemployee_id, employee_name, supervisor_name
FROM Supervisors; -- [31]
In an interview setting, using CTEs does more than just produce a correct answer; it
tells a story with the code. A well-structured query with clearly named CTEs acts as
self-documentation, guiding the interviewer through the logical steps of the solution.
A candidate can articulate their approach by saying, "First, I'll create a CTE named
MonthlyActiveUsers to isolate the users I need. Next, I'll create a second CTE called
UserFirstOrder to find each user's initial purchase date." This narrative approach
demonstrates structured thinking, strong communication skills, and an appreciation
for code maintainability—all qualities of a top-tier candidate. It elevates the
conversation from simply "Can you write this query?" to "Can you build a robust and
understandable solution?"
Section 8: The Analyst's Power Tool - Window Functions
Window functions are one of the most powerful features in modern SQL for advanced
analysis. They perform a calculation across a set of rows that are related to the
current row, but unlike aggregate functions with GROUP BY, they do not collapse the
rows into a single output row. Each row retains its separate identity.34
All window functions use the OVER() clause to define the "window" of rows the
function operates on. This clause has two key components:
● PARTITION BY: Divides the rows into partitions (groups). The window function is
applied independently to each partition. This is similar to GROUP BY but does not
collapse the rows.34
● ORDER BY: Orders the rows within each partition. This is crucial for functions that
depend on order, like ranking or running totals.34
● Ranking Functions: These assign a rank to each row within a partition based on
the ORDER BY clause.
○ ROW_NUMBER(): Assigns a unique, sequential integer to each row, starting
from 1. It gives distinct numbers even to rows with identical values (ties).34
○ RANK(): Assigns a rank to each row. Rows with the same value in the ordering
column receive the same rank. However, RANK() leaves gaps in the sequence
after ties. For example, if two rows tie for rank 2, the next rank will be 4.28
○ DENSE_RANK(): Similar to RANK(), but it does not leave gaps in the ranking
sequence after ties. If two rows tie for rank 2, the next rank will be 3.28
● Positional Functions: These access data from other rows relative to the current
row. They are invaluable for time-series analysis.
○ LEAD(): Accesses data from a subsequent row (peeking forward).28
○ LAG(): Accesses data from a preceding row (looking backward).28
The difference in how ranking functions handle ties is a frequent interview topic. The
following table clarifies this behavior.
Player Score
Amy 100
Bob 95
Carol 95
David 90
Emily 85
Frank 85
Grace 85
Henry 70
Query:
SQL
SELECT
Player,
Score,
ROW_NUMBER() OVER (ORDER BY Score DESC) AS RowNum,
RANK() OVER (ORDER BY Score DESC) AS Rank,
DENSE_RANK() OVER (ORDER BY Score DESC) AS DenseRank
FROM PlayerScores;
Result:
Amy 100 1 1 1
Carol 95 3 2 2 ROW_NUMB
ER continues
sequentially.
RANK and
DENSE_RAN
K give them
the same
rank.
Emily 85 5 5 4 Three-way
tie for 5th
place.
Frank 85 6 5 4 ROW_NUMB
ER is unique.
RANK and
DENSE_RAN
K are the
same for all
three.
Grace 85 7 5 4
Excelling in SQL interviews requires more than just writing correct queries. A top-tier
candidate demonstrates a deeper architectural awareness by understanding the
"why" behind the "what." This section covers essential database theory—indexing,
normalization, and ACID properties—that underpins the practical skills of query
writing. Discussing these topics intelligently signals maturity and a holistic
understanding of data systems.
Section 9: Performance and Optimization - Indexing
What is an Index?
A database index is a data structure that improves the speed of data retrieval
operations on a table, at the cost of additional writes and storage space.37 The most
common analogy is the index at the back of a book. Instead of scanning every page to
find a topic (a "full table scan"), you can look up the topic in the index and go directly
to the correct page number.38 Similarly, a database index holds a sorted copy of one
or more columns with pointers to the corresponding data rows, allowing the database
to find records much more efficiently.39
This trade-off means that indexing is not a universal solution. Over-indexing a table
that is frequently written to can harm performance more than it helps.37 The decision
to create an index requires a careful analysis of the table's query patterns.
Types of Indexes
● Clustered vs. Non-Clustered: This is a key distinction, particularly in interviews.
○ A clustered index determines the physical order of data in a table. Because
the data itself is sorted according to the index key, a table can have only one
clustered index. The primary key of a table is often, by default, its clustered
index.37
○ A non-clustered index is a separate data structure that contains the indexed
column values and a pointer to the location of the actual data row. The data
itself is not sorted according to this index. A table can have multiple
non-clustered indexes.9
● Single-column vs. Composite:
○ A single-column index is created on a single column.
○ A composite index (or multi-column index) is created on two or more
columns. This is useful for queries that frequently filter or join on a specific
combination of columns.37 The order of columns in a composite index matters
significantly.
In an interview, if you have written a query and the follow-up question is, "This query is
slow on a large table. How would you optimize it?", the first and best answer is almost
always related to indexing.5 A high-quality response would be: "My first step would be
to analyze the query's execution plan to see if the database is performing a full table
scan. If it is, I would recommend creating an index on the columns used in my
WHERE clause and in the ON conditions of my JOINs. This would allow the database
to perform a much faster index seek instead of reading every row in the table." This
answer is practical, precise, and demonstrates senior-level problem-solving skills.37
A common source of confusion for those new to SQL is why data is spread across so
many different tables. The answer lies in normalization, the process of organizing
tables in a database to minimize data redundancy and improve data integrity.27
Understanding normalization provides the foundational context for why
Normalization involves following a set of rules called "normal forms." While there are
several, 1NF, 2NF, and 3NF are the most important for practical purposes.44
● First Normal Form (1NF): A table is in 1NF if it meets two criteria:
1. Atomicity: Each cell must hold a single, indivisible value. You cannot have a
list of values in one cell.43
2. Primary Key: The table must have a primary key that uniquely identifies each
row.44
● Second Normal Form (2NF): A table is in 2NF if:
1. It is already in 1NF.
2. It has no partial dependencies. This means that every non-key attribute
must be fully dependent on the entire primary key. This rule is only relevant
for tables with a composite primary key (a primary key made of multiple
columns).44
● Third Normal Form (3NF): A table is in 3NF if:
1. It is already in 2NF.
2. It has no transitive dependencies. This means that a non-key attribute
cannot be dependent on another non-key attribute.44 For example, if you have
EmployeeID -> DepartmentID -> DepartmentName, the dependency of
DepartmentName on DepartmentID is a transitive dependency that should be
resolved by moving DepartmentName to a separate Departments table.
The process of normalization explains why relational databases are designed the way
they are. To reduce data redundancy (e.g., storing a customer's full address with
every single order they place) and prevent data anomalies (e.g., having to update the
address in hundreds of places if they move), the data is broken apart into logically
distinct tables. The "cost" of this clean, non-redundant design is that you must use
JOINs to reassemble the information for analysis. A candidate who can articulate this
connection—"This schema appears to be in 3NF, which is why customer data is in a
separate table from their orders. To get a complete view, I'll need to JOIN them on
CustomerID"—demonstrates a holistic understanding of the data environment that
goes far beyond simple query writing.
Knowing SQL syntax and theory is the prerequisite, but success in an interview or
online assessment comes from applying that knowledge under pressure to solve
specific problems. This final part bridges the gap between knowledge and
performance, providing a concrete action plan for success by identifying common
problem patterns, recommending practice platforms, and outlining a strategic
approach to the interview itself.
Theoretical knowledge must be forged into practical skill through deliberate practice.
Several online platforms provide interactive environments to hone SQL skills for
interviews.48
Platform Analysis
Choosing the right platform depends on your current skill level and learning style. A
strategic approach often involves using a combination of platforms.
● SQLZoo: Ideal for absolute beginners. It offers bite-sized, interactive tutorials that
guide users from basic SELECT statements to more advanced concepts in a
structured, easy-to-follow manner. It is entirely browser-based, requiring no
setup.48
● LeetCode: A popular platform for general coding interviews, with a substantial
collection of SQL problems. The questions are often algorithmic or puzzle-like
and are categorized by difficulty (Easy, Medium, Hard) and topic.15 The "SQL 50"
study plan is a well-regarded starting point for interview prep.51 While excellent
for mastering complex syntax and functions, some find the problems less
representative of real-world business scenarios.52
● HackerRank: Similar to LeetCode, HackerRank offers a wide range of SQL
challenges with difficulty levels from Basic to Advanced. It has a strong emphasis
on skills certification, offering timed assessments that can be used to validate
proficiency to potential employers.48 Its advanced topics include CTEs, window
functions, and query optimization.26
● StrataScratch & DataLemur: These platforms are highly praised for providing
practice problems that mirror real-world business cases and actual interview
questions from top tech companies like Meta, Amazon, and Google.48 They are
invaluable for learning how to apply SQL to solve practical business problems,
moving beyond pure syntax to analytical thinking.
Success in a technical interview is not just about producing the correct final query. It
is a performance that tests communication, problem-solving, and strategic thinking.
Communicating Your Thought Process
The interviewer wants to understand how you think. A silent candidate who produces
a perfect query is less impressive than a vocal one who explains their approach,
discusses trade-offs, and arrives at a solid solution. Use CTEs as a narrative device to
structure your answer. Walk the interviewer through your logic: "First, I will create a
CTE to filter for active users. Then, I will join that to the orders table..." This makes
your code self-documenting and your thought process transparent.
Ultimately, consistent and strategic practice is the key to building both the skill and
the confidence needed to excel. By mastering the fundamentals, understanding the
theory, recognizing common patterns, and practicing with a clear strategy, any
aspiring data professional can confidently tackle the SQL challenges that stand
between them and their desired role.
Works cited