Can you write this from memory?
Select all products where the price is greater than the average price of all products.
A common SQL subquery bug: you write NOT IN (subquery) to find "customers who never ordered," but the subquery returns a NULL—so your query returns zero rows. No error, no warning, just silently wrong results.
SQL uses three-valued logic. NOT IN compares with <> ALL, and x <> NULL is UNKNOWN, making the entire expression UNKNOWN. Use NOT EXISTS for anti-joins unless you can guarantee no NULLs.
These exercises drill subquery patterns until you know when IN beats EXISTS, when to rewrite a correlated subquery as a JOIN, and when LATERAL is the right tool.
This is the most common subquery bug in production:
-- Suppose orders.customer_id contains NULL values
SELECT * FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);
-- Returns 0 rows! Even for customers who never ordered.
Why? NOT IN (1, 2, NULL) evaluates as id <> 1 AND id <> 2 AND id <> NULL. The id <> NULL yields UNKNOWN, making the entire expression UNKNOWN. WHERE only includes rows where the condition is TRUE.
Solution: Use NOT EXISTS
SELECT * FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
NOT EXISTS returns TRUE when the subquery finds no rows—NULL values in the subquery don't affect the result. You can also express this as a LEFT JOIN anti-pattern.
See our filtering guide for a comprehensive explanation of three-valued logic and NULL handling. For a quick-reference card of subquery syntax, see the SQL subqueries cheat sheet.
The short answer
Modern query optimizers often transform IN and EXISTS into the same execution plan (semijoin). Choose based on clarity:
| Use Case | Recommendation |
|---|---|
| Filter by a list of values | IN |
| Check if related rows exist | EXISTS |
| Anti-join (no match) | NOT EXISTS (NULL-safe) |
| Large subquery result | EXISTS (can short-circuit) |
The detailed answer
-- These are often optimized identically:
-- IN version
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);
-- EXISTS version
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id AND o.total > 1000
);
When EXISTS is better:
- Anti-joins: NOT EXISTS is NULL-safe, NOT IN is not
- Early termination: EXISTS can stop at first match
- Correlated conditions: EXISTS naturally expresses "for this row, check if..."
When IN is better:
- Simple set membership: WHERE status IN ('active', 'pending')
- Readability for non-correlated checks
The equivalences
PostgreSQL documents these equivalences:
x IN (subquery)=x = ANY (subquery)x NOT IN (subquery)=x <> ALL (subquery)
Understanding this helps explain the NULL trap: <> ALL means "not equal to every value," and comparing to NULL yields UNKNOWN.
A subquery in FROM creates a derived table (also called an inline view):
SELECT country, avg_spend
FROM (
SELECT country, AVG(total_spend) AS avg_spend
FROM customers
GROUP BY country
) t
WHERE avg_spend > 1000;
When to use derived tables
- Filter aggregated results (alternative to HAVING)
- Pre-aggregate before joining (prevent row explosion)
- Compute intermediate results for further processing
Derived tables vs CTEs
-- Derived table
SELECT * FROM (SELECT ... ) t WHERE ...;
-- CTE (equivalent)
WITH t AS (SELECT ...)
SELECT * FROM t WHERE ...;
CTEs are better when:
- You need to reference the result multiple times
- The query is complex and benefits from named steps
- You need recursion (WITH RECURSIVE)
Derived tables are fine when:
- Single use, simple transformation
- You want to keep related logic together
Most developers never use ANY/ALL directly, but understanding them clarifies IN/NOT IN behavior:
-- These are equivalent:
WHERE id IN (1, 2, 3)
WHERE id = ANY (ARRAY[1, 2, 3])
-- These are equivalent:
WHERE id NOT IN (1, 2, 3)
WHERE id <> ALL (ARRAY[1, 2, 3])
ALL for "greater than every value"
-- Products more expensive than ALL clearance items
SELECT * FROM products
WHERE price > ALL (
SELECT price FROM products WHERE category = 'clearance'
);
-- Equivalent to:
WHERE price > (SELECT MAX(price) FROM products WHERE category = 'clearance')
ANY for "matches at least one"
-- Products cheaper than ANY premium item
SELECT * FROM products
WHERE price < ANY (
SELECT price FROM products WHERE category = 'premium'
);
-- Equivalent to:
WHERE price < (SELECT MAX(price) FROM products WHERE category = 'premium')
-
Zero rows from NOT IN? Check if the subquery returns any NULL values. Use NOT EXISTS instead.
-
Error: "subquery returns more than one row"? You used a scalar subquery where multiple rows match. Add LIMIT 1 or use IN/EXISTS instead.
-
Slow query? Check if a correlated subquery is running per-row. Consider rewriting as JOIN + GROUP BY.
-
Wrong counts? Scalar subquery counting may differ from JOIN + COUNT due to NULL handling. Use COUNT(column) not COUNT(*) with LEFT JOIN.
-
LATERAL not working? Check database support: PostgreSQL has LATERAL, MySQL 8.0.14+ has LATERAL, SQL Server uses CROSS APPLY / OUTER APPLY.
When to Use SQL Subqueries
- Filter by a derived set of values (IN subquery).
- Check existence without returning data (EXISTS).
- Compute a single value per row (scalar subquery).
- Create a derived table in FROM for intermediate calculations.
- Access correlated data with LATERAL (top-N per group, row-level lookups).
- Compare against all/any values in a set (ANY/ALL operators).
Check Your Understanding: SQL Subqueries
List customers who have never placed an order.
Use NOT EXISTS with a correlated subquery. NOT EXISTS is NULL-safe and clearly expresses "there is no matching order." ```sql -- Recommended: NOT EXISTS (NULL-safe, clear intent) SELECT c.* FROM customers c WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id ); ``` Alternative: LEFT JOIN anti-pattern (same result, sometimes preferred for readability): ```sql SELECT c.* FROM customers c LEFT JOIN orders o ON o.customer_id = c.id WHERE o.customer_id IS NULL; ``` **Avoid NOT IN** unless you can guarantee the subquery never returns NULL—otherwise the query may return zero rows silently.
What You'll Practice: SQL Subqueries
Common SQL Subqueries Pitfalls
- NOT IN + NULL returns no rows—use NOT EXISTS instead
- Correlated subqueries may run per row (check execution plans)
- Scalar subquery returning multiple rows causes an error
- IN/NOT IN can return NULL (unknown), not just true/false
- Multiple scalar subqueries on same table—rewrite as single JOIN
- Assuming subquery evaluation order (side effects are unpredictable)
SQL Subqueries FAQ
Why does NOT IN (subquery) return no rows?
If the subquery returns any NULL values, NOT IN returns UNKNOWN for every row (due to SQL's three-valued logic), so nothing matches. Use NOT EXISTS instead—it's NULL-safe. See our [filtering guide](/sql/filtering) for a detailed explanation of the NOT IN + NULL trap.
Is IN faster than EXISTS?
Often equivalent. Modern optimizers transform both into semijoin/antijoin plans. Performance depends on indexes and cardinality, not the keyword. Use EXISTS when you only care about existence (it can stop at first match) and IN when checking set membership. For NOT IN vs NOT EXISTS, prefer NOT EXISTS to avoid NULL issues.
What does SELECT 1 mean in EXISTS?
EXISTS only checks if any row exists—it ignores what you select. SELECT 1, SELECT *, and SELECT column all behave identically inside EXISTS. SELECT 1 is conventional because it signals 'I only care about existence, not the data.'
Will the database evaluate the whole subquery?
Not necessarily. EXISTS can stop after finding the first matching row. IN with a subquery may be executed once and cached, or transformed into a semijoin. PostgreSQL docs warn against relying on full evaluation order—side effects in subqueries are unpredictable.
When should I use a subquery in FROM vs a CTE?
CTEs (WITH clauses) are more readable for complex queries and can be referenced multiple times. Subqueries in FROM are fine for simple derived tables. Some databases materialize CTEs differently—PostgreSQL 12+ inlines most CTEs like subqueries unless you use MATERIALIZED. See our [CTE guide](/sql/ctes) for details.
What's LATERAL and when do I need it?
LATERAL lets a subquery in FROM reference columns from preceding tables—like a correlated subquery but in the FROM clause. Use it for 'top-1 per row' queries: for each customer, get their most recent order. Without LATERAL, you'd need window functions or correlated scalar subqueries.
What are ANY and ALL in SQL?
ANY/ALL compare a value against all values from a subquery. `= ANY(subquery)` is equivalent to IN. `<> ALL(subquery)` is equivalent to NOT IN (with the same NULL pitfall). `> ALL(subquery)` means greater than every value. These are rarely used directly but help understand how IN/NOT IN work internally.
Why is my correlated subquery slow?
Correlated subqueries can execute once per row (nested-loop style). Rewrite as a JOIN with aggregation when possible: instead of SELECT (SELECT COUNT(*) FROM orders WHERE customer_id = c.id), use LEFT JOIN orders GROUP BY customer. The optimizer may rewrite it anyway, but explicit joins are often clearer and faster.
Scalar subquery vs JOIN—which should I use?
Scalar subqueries return one value per row. For simple lookups, they're readable. For aggregations across many rows, JOIN + GROUP BY is usually faster and clearer. If you're doing multiple scalar subqueries on the same table, definitely rewrite as a single JOIN.
SQL Subqueries Syntax Quick Reference
SELECT * FROM orders
WHERE customer_id IN (
SELECT id FROM customers WHERE country = 'USA'
);-- Customers who have placed at least one order
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders WHERE customer_id = c.id
);-- Customers who have never ordered
SELECT c.*
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id
);-- Order count per customer (can be slow for large tables)
SELECT c.id, c.name,
(SELECT COUNT(*) FROM orders o
WHERE o.customer_id = c.id) AS order_count
FROM customers c;-- Same result, usually faster for large tables
SELECT c.id, c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;-- Filter aggregated results without HAVING
SELECT country, avg_spend
FROM (
SELECT country, AVG(total_spend) AS avg_spend
FROM customers
GROUP BY country
) t
WHERE avg_spend > 1000;-- = ANY is equivalent to IN
SELECT * FROM products
WHERE category = ANY (
SELECT category FROM featured_categories
);-- Price greater than ALL clearance items
SELECT * FROM products p
WHERE p.price > ALL (
SELECT price FROM products
WHERE category = 'clearance'
);-- Most recent order per customer (PostgreSQL)
SELECT c.id, c.name, last_order.id AS order_id, last_order.created_at
FROM customers c
LEFT JOIN LATERAL (
SELECT o.*
FROM orders o
WHERE o.customer_id = c.id
ORDER BY o.created_at DESC
LIMIT 1
) last_order ON true;-- Alternative with window function (no LATERAL needed)
SELECT c.id, c.name, o.id AS order_id, o.created_at
FROM customers c
LEFT JOIN (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY created_at DESC
) AS rn
FROM orders
) o ON o.customer_id = c.id AND o.rn = 1;SQL Subqueries Sample Exercises
Write a query that selects all columns from the products table, plus a column showing the average price of all products (name it avg_price).
SELECT *, (SELECT AVG(price) FROM products) AS avg_price FROM productsFill in the blank to find products priced above average
AVG(price)If the orders table has orders totaling 100, 200, and 300, what does this subquery return?
300+ 12 more exercises