DEV Community

Jing for Chat2DB

Posted on

SQL Subqueries: Power Up Your Data Retrieval

Image description

I. What is a Subquery?

Definition:
A subquery, also known as an inner query or nested query, is a query embedded within another SQL query (the outer query). The subquery executes first, and its result is then used by the outer query.

Permissible Clauses:
A subquery can contain most clauses that a standard SELECT statement can, such as DISTINCT, GROUP BY, ORDER BY, LIMIT, JOIN, and UNION. The outer query, which contains the subquery, must be one of the following statements: SELECT, INSERT, UPDATE, DELETE, SET, or DO.

Placement of Subqueries:
Subqueries can typically be placed in:

  • The SELECT list
  • The FROM clause
  • The WHERE clause

Using subqueries directly within GROUP BY or ORDER BY clauses is generally not practical or common.

II. Types of Subqueries

Subqueries can be categorized based on what they return:

  1. Scalar Subquery: Returns a single value (one row, one column). This is the simplest form.
  2. Column Subquery: Returns a single column of one or more rows.
  3. Row Subquery: Returns a single row of one or more columns.
  4. Table Subquery: Returns a virtual table of one or more rows and one or more columns.

Operators for Subqueries:
Common operators used with subqueries include: =, >, <, >=, <=, <>, ANY, IN, SOME, ALL, and EXISTS.

If a subquery returns a scalar value, standard comparison operators (=, >, <, etc.) can be used. If it returns more than a single value and you attempt to use a scalar comparison operator, it will typically result in an error.

1. Scalar Subquery

A scalar subquery returns exactly one row and one column. This single value can then be used in comparisons.

Examples:

  • Find all employees in the 'Marketing' department:

    SELECT employee_name, salary FROM Employees WHERE department_id = (SELECT department_id FROM Departments WHERE department_name = 'Marketing'); 
  • Find products with the highest unit price in the 'Beverages' category:

    SELECT product_name, unit_price FROM Products WHERE unit_price = (SELECT MAX(unit_price) FROM Products WHERE category_name = 'Beverages'); 
  • Find employees whose salary matches the average salary of their respective job titles (correlated scalar subquery):

    SELECT e.employee_name, e.salary, e.job_title FROM Employees e WHERE e.salary = (SELECT AVG(emp.salary) FROM Employees emp WHERE emp.job_title = e.job_title); 

2. Column Subquery

A column subquery returns a single column of zero or more rows. These are often used with operators like IN, ANY, SOME, or ALL.

Examples:

  • Find all products supplied by suppliers located in 'USA':

    SELECT product_name FROM Products WHERE supplier_id IN (SELECT supplier_id FROM Suppliers WHERE country = 'USA'); 
  • Find employees whose salary is greater than any salary in the 'Intern' job category:

    SELECT employee_name, salary FROM Employees WHERE salary > ANY (SELECT salary FROM Employees WHERE job_title = 'Intern'); 
  • Find products more expensive than all products in the 'Accessories' category:

    SELECT product_name, price FROM Products WHERE price > ALL (SELECT price FROM Products WHERE category_id = (SELECT id FROM Categories WHERE name = 'Accessories')); 

    Note: NOT IN is equivalent to <> ALL.
    Special Cases with ALL:

    • If the subquery returns an empty set, column > ALL (subquery) evaluates to TRUE.
    • If the subquery returns values including NULL (e.g., (10, NULL, 20)), and the comparison value is greater than all non-NULL values (e.g., 30 > ALL (10, NULL, 20)), the result is UNKNOWN.

3. Row Subquery

A row subquery returns a single row with one or more columns. The comparison must match the structure of the row.

Examples:

  • Find the employee who has the same job title and hire date as 'John Smith':

    SELECT employee_name, department FROM Employees WHERE (job_title, hire_date) = (SELECT job_title, hire_date FROM Employees WHERE employee_name = 'John Smith'); 

    (Note: (value1, value2) is often equivalent to ROW(value1, value2))

  • Find orders that match a specific customer's latest order details:

    SELECT order_id, order_date FROM Orders WHERE (customer_id, product_id, quantity) = ( SELECT customer_id, product_id, quantity FROM RecentCustomerPurchases WHERE customer_id = 12345 AND purchase_type = 'latest' ); 

4. Table Subquery

A table subquery returns multiple rows and multiple columns (a virtual table). These are most commonly used in the FROM clause and are often referred to as derived tables.

Example (in FROM clause):

  • Find the average salary for each department:

    SELECT d.department_name, AvgSalaries.avg_salary FROM Departments d JOIN ( SELECT department_id, AVG(salary) AS avg_salary FROM Employees GROUP BY department_id ) AS AvgSalaries ON d.department_id = AvgSalaries.department_id; 

    Example (with IN for multiple columns, if supported or as a conceptual illustration):

  • Find students enrolled in the same set of (course_id, semester_code) as those in 'Advanced Studies Program':

    SELECT student_name FROM StudentEnrollments se WHERE (se.course_id, se.semester_code) IN ( SELECT asp.course_id, asp.semester_code FROM AdvancedProgramCourses asp ); 

III. Subquery Usage with Keywords

1. Subqueries with ANY (or SOME)

The ANY keyword (and its alias SOME) returns TRUE if the comparison is true for at least one of the values returned by the column subquery.

  • Example: score > ANY (SELECT min_score FROM ExamRequirements) means the score is greater than at least one of the minimum scores.

    SELECT product_name, list_price FROM Products WHERE list_price > ANY ( SELECT discounted_price FROM SpecialOffers WHERE start_date <= CURRENT_DATE AND end_date >= CURRENT_DATE ); -- This finds products whose list price is greater than at least one currently active discounted price. 

2. Subqueries with IN

The IN operator checks if a value matches any value in the list returned by the subquery. It's an alias for = ANY.
NOT IN is the negation and is an alias for <> ALL.

SELECT c.customer_name FROM Customers c WHERE c.country IN (SELECT country_name FROM EuropeanCountries); -- Finds customers located in any European country listed in the EuropeanCountries table. 
Enter fullscreen mode Exit fullscreen mode

3. Subqueries with ALL

The ALL keyword returns TRUE if the comparison is true for all values returned by the column subquery.
Example: score > ALL (SELECT passing_score FROM PreviousExams) means the score is greater than every passing score from previous exams.

SELECT employee_name, salary FROM Employees WHERE salary >= ALL ( SELECT minimum_wage FROM RegionalWageStandards WHERE region_id = Employees.region_id -- Correlated example ); -- Finds employees whose salary is greater than or equal to all minimum wage standards in their respective regions. 
Enter fullscreen mode Exit fullscreen mode

4. Scalar vs. Multi-Value Subqueries (Revisited)

Scalar Subqueries: Return a single value. Essential when using direct comparison operators (=, >, <).

Multi-Value Subqueries: Return a set of values (a column, a row, or a table). Used with operators like IN, ANY, ALL, EXISTS. Using scalar comparison operators with multi-value subqueries will typically result in an error unless the operator is modified by ANY, SOME, or ALL.

5. Independent vs. Correlated Subqueries

Independent Subquery: Can be executed on its own, without depending on the outer query. The subquery is typically evaluated once.

-- Find orders for products in the 'Electronics' category SELECT order_id, order_date FROM Orders WHERE product_id IN ( SELECT product_id FROM Products WHERE category = 'Electronics' -- Independent subquery ); 
Enter fullscreen mode Exit fullscreen mode

Correlated Subquery: References one or more columns from the outer query. The subquery is evaluated for each row processed by the outer query. This can impact performance.

-- Find employees who earn more than the average salary in their respective departments SELECT e1.employee_name, e1.salary, d.department_name FROM Employees e1 JOIN Departments d ON e1.department_id = d.department_id WHERE e1.salary > ( SELECT AVG(e2.salary) FROM Employees e2 WHERE e2.department_id = e1.department_id -- Correlated: e1.department_id links to outer query ); 
Enter fullscreen mode Exit fullscreen mode

When dealing with performance, EXPLAIN (or your database's equivalent) is your friend to understand how the database executes the query. Independent subqueries are often more efficient (O(m+n)) than correlated ones (O(m*n) in naive execution).

6. The EXISTS Predicate

EXISTS checks if a subquery returns any rows. It returns TRUE if one or more rows are returned, and FALSE otherwise. It never returns UNKNOWN.

-- Find departments that have at least one employee SELECT d.department_name FROM Departments d WHERE EXISTS ( SELECT 1 FROM Employees e WHERE e.department_id = d.department_id ); 
Enter fullscreen mode Exit fullscreen mode

IN vs. EXISTS:

While they can often achieve similar results, EXISTS focuses on the existence of rows, while IN compares values. EXISTS can be more efficient, especially for large subquery result sets, as it can stop processing as soon as a matching row is found. EXISTS handles NULLs more predictably than IN (especially NOT IN). NOT EXISTS is often preferred over NOT IN when NULL values might be present in the subquery's result set.

7. Derived Tables

A derived table is a subquery used in the FROM clause of an outer query. The result of this subquery is treated as a temporary, virtual table.

-- Select the top 3 most expensive products from each category SELECT dt.category_name, dt.product_name, dt.price FROM ( SELECT c.category_name, p.product_name, p.price, ROW_NUMBER() OVER(PARTITION BY c.category_name ORDER BY p.price DESC) as rn FROM Products p JOIN Categories c ON p.category_id = c.id ) AS dt WHERE dt.rn <= 3; 
Enter fullscreen mode Exit fullscreen mode

Subquery Optimization

While subqueries offer flexibility, they can sometimes lead to inefficient query execution, often because the database might create temporary tables for the subquery results.
Using JOINs instead of Subqueries:
In many cases, rewriting a subquery using a JOIN can improve performance. JOIN operations are often more directly optimizable by the database.

Example 1: Replacing NOT IN with LEFT JOIN ... IS NULL

-- Original (Find departments with no employees) SELECT department_name FROM Departments WHERE department_id NOT IN (SELECT DISTINCT department_id FROM Employees WHERE department_id IS NOT NULL); -- Optimized with LEFT JOIN SELECT d.department_name FROM Departments d LEFT JOIN Employees e ON d.department_id = e.department_id WHERE e.employee_id IS NULL; 
Enter fullscreen mode Exit fullscreen mode

Example 2: Replacing IN with INNER JOIN (for existence check)

-- Original (Find customers who have placed orders) SELECT customer_name FROM Customers WHERE customer_id IN (SELECT customer_id FROM Orders); -- Optimized with INNER JOIN (or EXISTS) SELECT DISTINCT c.customer_name FROM Customers c INNER JOIN Orders o ON c.customer_id = o.customer_id; 
Enter fullscreen mode Exit fullscreen mode

When Optimization is Challenging:

Not all subqueries can be easily or effectively optimized into JOINs by all database systems. This can be true for:

  • Certain subqueries involving aggregate functions that are difficult to "flatten."
  • Complex correlated subqueries.
  • Specific uses of ANY, ALL, or NOT IN where NULL values are involved, as their three-valued logic (TRUE, FALSE, UNKNOWN) can be tricky.
  • Limitations within the database optimizer itself.

Always consult your database's EXPLAIN plan to understand how it's executing your query and identify potential bottlenecks.

Ready to take your SQL skills to the next level and manage your databases with unparalleled ease?

Discover Chat2DB – your intelligent, AI-powered SQL client and reporting tool! Whether you're crafting complex subqueries like the ones we've explored, optimizing query performance, or generating insightful data visualizations, Chat2DB is designed to streamline your workflow. With features like AI-assisted query generation, schema exploration, and direct data editing, Chat2DB

Top comments (0)