SQL Window Functions Practice
Master SQL window functions: ROW_NUMBER for top-N per group, running totals (cumulative sum) with explicit ROWS frames, ROWS vs RANGE differences, LAG/LEAD for row comparisons, LAST_VALUE frame trap, and the CTE/QUALIFY pattern for filtering window results.
Can you write this from memory?
Select total and the overall sum of all totals from the orders table. Use a window function to show the sum alongside each row.
A common SQL window function mistake: you write WHERE row_number <= 3 and the query fails. Window functions can only appear in SELECT and ORDER BY—they're computed after WHERE/HAVING/GROUP BY.
Window functions run after aggregation but before final output. To filter on a window function, wrap your query in a CTE or subquery, then filter in the outer query.
Another common mistake: you use SUM() OVER (ORDER BY date) for a running total and get unexpected results when dates have ties. The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW—which includes all rows with the same ORDER BY value.
These exercises drill window function patterns until PARTITION BY, frame clauses, and ranking functions are automatic.
Understanding SQL's logical execution order clarifies why window functions work the way they do:
1. FROM / JOIN -- assemble raw rows
2. WHERE -- filter rows
3. GROUP BY -- form groups
4. aggregates -- compute COUNT, SUM, etc.
5. HAVING -- filter groups
6. WINDOW -- compute window functions ← HERE
7. SELECT -- pick columns
8. ORDER BY -- sort output
9. LIMIT -- cap results
Window functions run at step 6—after WHERE, GROUP BY, and HAVING. That's why you can't write WHERE row_number() <= 3. The WHERE clause has already executed before window functions are computed.
The CTE pattern for filtering
Wrap the window calculation, then filter in the outer query:
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC, id DESC) AS rn
FROM products
)
SELECT * FROM ranked WHERE rn <= 5;
QUALIFY: the shortcut (BigQuery, Snowflake, Databricks)
Some databases offer QUALIFY as a window-filtering clause:
-- Snowflake/BigQuery/Databricks
SELECT *
FROM products
QUALIFY ROW_NUMBER() OVER (
PARTITION BY category ORDER BY price DESC, id DESC
) <= 5;
This is cleaner but not portable. Use CTEs for cross-database compatibility.
This trips up a lot of people.
The default frame
With ORDER BY in a window, the default frame is:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
The key word is RANGE. RANGE groups rows with equal ORDER BY values (ties).
Why this matters for running totals
-- Sample data with ties:
-- | date | amount |
-- | 2024-01-01 | 100 |
-- | 2024-01-01 | 200 | -- same date!
-- | 2024-01-02 | 150 |
-- Default RANGE behavior:
SELECT date, amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions;
-- Result (both Jan 1 rows get the same total!):
-- | date | amount | running_total |
-- | 2024-01-01 | 100 | 300 | -- includes both Jan 1 rows
-- | 2024-01-01 | 200 | 300 | -- same total!
-- | 2024-01-02 | 150 | 450 |
Fix: use ROWS for deterministic running totals (cumulative sum)
SELECT date, amount,
SUM(amount) OVER (
ORDER BY date, id -- add tie-breaker
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM transactions;
-- Result (each row has unique cumulative sum):
-- | date | amount | running_total |
-- | 2024-01-01 | 100 | 100 |
-- | 2024-01-01 | 200 | 300 |
-- | 2024-01-02 | 150 | 450 |
Rule: When you need a "running" calculation (running sum, running average, running count), use ROWS with a tie-breaker in ORDER BY.
LAST_VALUE with the default frame returns the current row—not the partition's last row:
-- WRONG: Returns current row's amount (useless)
SELECT customer_id, order_date, amount,
LAST_VALUE(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS last_amount -- equals current amount!
FROM orders;
Why? The default frame ends at CURRENT ROW. "Last value in frame" means "last value up to here"—the current row.
Fix: extend the frame
-- RIGHT: Returns actual last order amount per customer
SELECT customer_id, order_date, amount,
LAST_VALUE(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_amount
FROM orders;
Note: FIRST_VALUE doesn't have this problem because the default frame starts at UNBOUNDED PRECEDING, which includes the first row.
This confuses many learners:
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS rank
FROM players;
-- Output order is NOT guaranteed to be by score!
The ORDER BY score DESC inside OVER tells the window function which rows are "before" and "after" for numbering purposes. It does not sort the query output.
Always add a query-level ORDER BY
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS rank
FROM players
ORDER BY score DESC; -- THIS sorts the output
-- UNSTABLE: Same-score players get arbitrary row numbers
ROW_NUMBER() OVER (ORDER BY score DESC)
-- STABLE: id ensures consistent ordering
ROW_NUMBER() OVER (ORDER BY score DESC, id ASC)
Without a tie-breaker, the database can return different row numbers for tied rows on each execution. This breaks pagination, deduplication, and top-N queries. The same tie-breaker principle applies to ORDER BY with LIMIT in pagination queries.
Dedup: keep one row per group
-- Keep most recent login per user
WITH latest AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY login_at DESC, id DESC
) AS rn
FROM logins
)
SELECT * FROM latest WHERE rn = 1;
Running total with partitions
-- Running total per customer (resets for each customer)
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date, id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS customer_running_total
FROM orders;
Percent of total
-- Each row's amount as percentage of category total
SELECT
category,
product_name,
amount,
ROUND(100.0 * amount / SUM(amount) OVER (PARTITION BY category), 2) AS pct_of_category
FROM products;
Gap detection with LAG
-- Find gaps in sequential IDs (must use CTE to filter)
WITH sequenced AS (
SELECT id,
LAG(id) OVER (ORDER BY id) AS prev_id
FROM records
)
SELECT id, prev_id, id - prev_id AS gap
FROM sequenced
WHERE id - prev_id > 1;
-
Filtering by window function fails? Window functions can't appear in WHERE/HAVING. Wrap in CTE, then filter.
-
Results not sorted? ORDER BY in OVER doesn't sort output. Add a query-level ORDER BY.
-
Running total groups ties together? Default frame is RANGE. Use explicit ROWS frame.
-
LAST_VALUE returns current row? Default frame ends at current row. Extend to UNBOUNDED FOLLOWING.
-
ROW_NUMBER gives different results each run? Add a unique tie-breaker column to ORDER BY.
-
Unexpected values at partition boundaries? Check LAG/LEAD—they return NULL at edges by default. Use the default parameter:
LAG(col, 1, 0).
When to Use SQL Window Functions
- Rank rows within groups (top-N per category, dedup by keeping latest).
- Calculate running totals or moving averages.
- Compare a row to previous/next rows with LAG/LEAD.
- Compute percentiles or cumulative distributions.
- Add row numbers for pagination or deduplication.
Check Your Understanding: SQL Window Functions
Return the top 3 orders per customer by amount.
Wrap ROW_NUMBER in a CTE, then filter in the outer query. Add a deterministic tie-breaker (id) to prevent unstable results: ```sql WITH ranked AS ( SELECT o.*, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY amount DESC, id DESC -- tie-breaker ) AS rn FROM orders o ) SELECT * FROM ranked WHERE rn <= 3 ORDER BY customer_id, rn; ``` **Why a CTE?** Window functions are computed after WHERE, so you can't filter on them directly. The CTE computes the ranking, then the outer query filters. **Dialect note:** BigQuery and Snowflake support `QUALIFY rn <= 3` directly without a CTE—handy but not portable.
What You'll Practice: SQL Window Functions
Common SQL Window Functions Pitfalls
- Can't filter window functions in WHERE/HAVING—use CTE/subquery (or QUALIFY in some DBs)
- ORDER BY in OVER does not sort query output—add a separate ORDER BY clause
- Default frame with ORDER BY is RANGE (includes ties)—use ROWS for deterministic running totals
- LAST_VALUE returns current row with default frame—extend frame to UNBOUNDED FOLLOWING
- ROW_NUMBER without tie-breaker gives unstable results—always add a unique column
- Confusing RANK vs DENSE_RANK (gaps vs no gaps after ties)
- Performance with large partitions—window functions may require full partition scans
SQL Window Functions FAQ
Why can't I use ROW_NUMBER() in WHERE or HAVING?
Window functions are computed after WHERE, HAVING, and GROUP BY in the query pipeline. They only exist in SELECT and ORDER BY. To filter by a window function result, wrap your query in a CTE or subquery, then filter in the outer query. Some dialects (BigQuery, Snowflake) offer QUALIFY as a shortcut.
How is PARTITION BY different from GROUP BY?
GROUP BY collapses rows into aggregates—one output row per group. PARTITION BY keeps all rows and computes windowed values across each partition. Use GROUP BY when you want aggregated output; use PARTITION BY when you want per-row calculations within groups.
What is the default window frame?
It depends on whether you have ORDER BY. With ORDER BY: `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` (includes ties). Without ORDER BY: effectively the entire partition. This default is why running totals can behave unexpectedly with ties—use explicit ROWS frames for precision.
ROWS vs RANGE: what is the difference?
ROWS counts physical row positions (row 1, row 2, row 3). RANGE groups rows with equal ORDER BY values (ties). For `SUM() OVER (ORDER BY date)`, if three rows have the same date, RANGE includes all three at once while ROWS processes them one at a time. Use ROWS for predictable running totals.
Why does ORDER BY in OVER not sort my final results?
ORDER BY inside OVER() only controls the window frame ordering—which rows are 'before' and 'after' for the window calculation. It does NOT sort the query output. You need a separate ORDER BY clause at the end of your query to sort the final results.
Why does LAST_VALUE return the current row instead of the partition last?
The default frame ends at CURRENT ROW, so 'last value in frame' is the current row itself. To get the actual last value in the partition, use an explicit frame: `LAST_VALUE(col) OVER (PARTITION BY x ORDER BY y ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)`.
Do I always need a frame clause?
Not always—but explicit frames prevent surprises. Ranking functions (ROW_NUMBER, RANK, DENSE_RANK) ignore frames. Aggregate windows (SUM, AVG, COUNT) use frames and have subtle defaults with ties. When precision matters, always specify ROWS BETWEEN explicitly.
What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?
All three assign numbers within partitions, but handle ties differently. ROW_NUMBER: unique numbers (1,2,3,4) even with ties—result depends on internal ordering. RANK: same number for ties, then gaps (1,1,3,4). DENSE_RANK: same number for ties, no gaps (1,1,2,3). Use ROW_NUMBER for dedup (keep one row per group); RANK/DENSE_RANK for leaderboards where ties matter.
How do I get a moving average?
Use AVG with a ROWS frame: `AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)` gives a 7-day moving average. The frame includes the current row plus the 6 preceding rows.
What is a named window (WINDOW clause)?
PostgreSQL and MySQL support defining reusable window specifications: `WINDOW w AS (PARTITION BY x ORDER BY y)`, then use `SUM(a) OVER w, AVG(b) OVER w`. Reduces repetition when multiple columns use the same window. Not supported in all databases.
SQL Window Functions Syntax Quick Reference
-- Unique row numbers; id breaks ties deterministically
SELECT *,
ROW_NUMBER() OVER (
ORDER BY created_at DESC, id DESC
) AS row_num
FROM orders;-- Top 3 orders per customer by amount
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY amount DESC, id DESC
) AS rn
FROM orders
)
SELECT * FROM ranked WHERE rn <= 3;-- Keep only the most recent event per user
WITH deduped AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY updated_at DESC, id DESC
) AS rn
FROM user_events
)
SELECT * FROM deduped WHERE rn = 1;-- ROWS avoids RANGE tie-grouping surprises
SELECT
date,
amount,
SUM(amount) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM transactions;-- Current row + 6 preceding = 7 rows
SELECT
date,
value,
AVG(value) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS avg_7d
FROM metrics;-- Calculate day-over-day change
SELECT
date,
value,
value - LAG(value) OVER (ORDER BY date) AS change,
LAG(value) OVER (ORDER BY date) AS prev_value
FROM metrics;-- Show next scheduled date
SELECT
event_name,
event_date,
LEAD(event_date) OVER (ORDER BY event_date) AS next_event
FROM events;-- Extend frame to see actual partition last
SELECT
customer_id,
order_date,
amount,
LAST_VALUE(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_order_amount
FROM orders;-- First order amount per customer
SELECT
customer_id,
order_date,
amount,
FIRST_VALUE(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS first_order_amount
FROM orders;-- RANK: gaps after ties (1,1,3)
-- DENSE_RANK: no gaps (1,1,2)
SELECT
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM players;-- PERCENT_RANK: 0-1 relative position
-- NTILE: divide into N buckets
SELECT
name,
score,
PERCENT_RANK() OVER (ORDER BY score) AS pct_rank,
NTILE(4) OVER (ORDER BY score) AS quartile
FROM players;-- Define once, reuse multiple times
SELECT
date,
amount,
SUM(amount) OVER w AS running_sum,
AVG(amount) OVER w AS running_avg
FROM transactions
WINDOW w AS (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
);SQL Window Functions Sample Exercises
Complete the window function to count orders per user
OVERComplete the clause to partition by product_id
PARTITIONA table has 10 rows with 3 distinct user_ids. How many rows does this query return? SELECT user_id, SUM(total) OVER(PARTITION BY user_id) FROM orders
10+ 17 more exercises