Window_functions

Chapter 20: Window Functions

Advanced Analytics in PostgreSQL

20.1 Window Function Fundamentals

Window functions perform calculations across a set of rows related to the current row, without collapsing the result set.

                    Window Function vs Aggregate Function
    ========================================================================

    Aggregate Function:
    ===================

    SELECT category, SUM(amount) FROM sales GROUP BY category;

    Returns: 3 rows (one per category)
    ┌─────────────┬──────────┐
    │ category    │ sum     │
    ├─────────────┼──────────┤
    │ Electronics │ 2229.97 │
    │ Books       │ 44.98   │
    │ Clothing    │ 39.99   │
    └─────────────┴──────────┘

    Window Function:
    ================

    SELECT *, SUM(amount) OVER () FROM sales;

    Returns: Same number of rows as input (6 rows)
    ┌─────────┬────────────┬──────────┐
    │ product │ category   │ sum     │
    ├─────────┼────────────┼──────────┤
    │ Laptop  │ Electronics│ 2314.94 │
    │ Mouse   │ Electronics│ 2314.94 │
    │ Book    │ Books      │ 2314.94 │
    │ Laptop  │ Electronics│ 2314.94 │
    │ Shirt   │ Clothing   │ 2314.94 │
    │ Book    │ Books     │ 2314.94 │
    └─────────┴────────────┴──────────┘

    Window Function Components:
    ===========================

    • OVER: Defines the window
    • PARTITION BY: Groups rows
    • ORDER BY: Orders rows within partition
    • ROWS/RANGE: Frame within partition

20.2 Window Function Syntax

-- Basic window function
SELECT
    product,
    amount,
    SUM(amount) OVER () AS total_amount
FROM sales;

-- Window with PARTITION BY
SELECT
    product,
    category,
    amount,
    SUM(amount) OVER (PARTITION BY category) AS category_total
FROM sales;

-- Window with ORDER BY
SELECT
    product,
    sale_date,
    amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;

-- Window with PARTITION BY and ORDER BY
SELECT
    product,
    category,
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY category
        ORDER BY sale_date
    ) AS running_total_by_category
FROM sales;

-- Multiple window functions
SELECT
    product,
    amount,
    SUM(amount) OVER () AS total,
    AVG(amount) OVER () AS average,
    COUNT(*) OVER () AS count,
    MIN(amount) OVER () AS minimum,
    MAX(amount) OVER () AS maximum
FROM sales;

20.3 Ranking Functions

-- ROW_NUMBER: Sequential number within partition
SELECT
    product,
    category,
    amount,
    ROW_NUMBER() OVER (ORDER BY amount DESC) AS row_num
FROM sales;

-- RANK: Same value = same rank, with gaps
SELECT
    product,
    category,
    amount,
    RANK() OVER (PARTITION BY category ORDER BY amount DESC) AS rank
FROM sales;

-- DENSE_RANK: Same value = same rank, without gaps
SELECT
    product,
    category,
    amount,
    DENSE_RANK() OVER (PARTITION BY category ORDER BY amount DESC) AS dense_rank
FROM sales;

-- NTILE: Distribute rows into n groups
SELECT
    product,
    amount,
    NTILE(4) OVER (ORDER BY amount) AS quartile
FROM sales;

-- Compare ranking functions
SELECT
    product,
    amount,
    ROW_NUMBER() OVER (ORDER BY amount DESC) AS row_num,
    RANK() OVER (ORDER BY amount DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY amount DESC) AS dense_rank
FROM sales
ORDER BY amount DESC;

20.4 Lead and Lag Functions

Access values from subsequent or previous rows.

-- LAG: Previous row value
SELECT
    sale_date,
    amount,
    LAG(amount) OVER (ORDER BY sale_date) AS prev_amount
FROM sales
ORDER BY sale_date;

-- LEAD: Next row value
SELECT
    sale_date,
    amount,
    LEAD(amount) OVER (ORDER BY sale_date) AS next_amount
FROM sales
ORDER BY sale_date;

-- LAG with offset
SELECT
    sale_date,
    amount,
    LAG(amount, 1) OVER (ORDER BY sale_date) AS prev_1,
    LAG(amount, 2) OVER (ORDER BY sale_date) AS prev_2
FROM sales
ORDER BY sale_date;

-- Compare to previous period
SELECT
    sale_date,
    amount,
    LAG(amount) OVER (ORDER BY sale_date) AS prev_amount,
    amount - LAG(amount) OVER (ORDER BY sale_date) AS change,
    ROUND(
        (amount - LAG(amount) OVER (ORDER BY sale_date)) /
        NULLIF(LAG(amount) OVER (ORDER BY sale_date), 0) * 100,
        2
    ) AS pct_change
FROM sales
ORDER BY sale_date;

-- Partitioned LAG/LAG
SELECT
    category,
    sale_date,
    amount,
    LAG(amount) OVER (PARTITION BY category ORDER BY sale_date) AS prev_amount
FROM sales
ORDER BY category, sale_date;

20.5 First and Last Values

-- FIRST_VALUE: First row in partition
SELECT
    sale_date,
    amount,
    FIRST_VALUE(amount) OVER (ORDER BY sale_date) AS first_amount
FROM sales
ORDER BY sale_date;

-- LAST_VALUE: Last row in partition (needs frame specification)
SELECT
    sale_date,
    amount,
    LAST_VALUE(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_amount
FROM sales
ORDER BY sale_date;

-- FIRST_VALUE by partition
SELECT
    category,
    sale_date,
    amount,
    FIRST_VALUE(amount) OVER (
        PARTITION BY category
        ORDER BY sale_date
    ) AS first_in_category
FROM sales;

-- NTH_VALUE: nth row
SELECT
    sale_date,
    amount,
    NTH_VALUE(amount, 2) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_amount
FROM sales
ORDER BY sale_date;

20.6 Window Frame

Define which rows to include in the window calculation.

-- ROWS: Physical row offsets
SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS moving_sum_3
FROM sales;

-- RANGE: Value-based offsets
SELECT
    amount,
    SUM(amount) OVER (
        ORDER BY amount
        RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING
    ) AS range_sum
FROM (SELECT DISTINCT amount FROM sales) sub;

-- Common frame specifications:
-- ROWS UNBOUNDED PRECEDING - from start to current
-- ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING - entire partition
-- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING - moving window

-- Moving average example
SELECT
    sale_date,
    amount,
    AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3
FROM sales
ORDER BY sale_date;

-- Cumulative sum
SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
        ROWS UNBOUNDED PRECEDING
    ) AS cumulative_sum
FROM sales
ORDER BY sale_date;

20.7 Practical Window Function Examples

-- Top N per group
SELECT *
FROM (
    SELECT
        category,
        product,
        amount,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY amount DESC) AS rank
    FROM sales
) ranked
WHERE rank <= 2;

-- Calculate percentage of group total
SELECT
    category,
    product,
    amount,
    ROUND(
        amount::numeric / SUM(amount) OVER (PARTITION BY category) * 100,
        2
    ) AS pct_of_category
FROM sales;

-- Running total by category
SELECT
    category,
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY category
        ORDER BY sale_date
        ROWS UNBOUNDED PRECEDING
    ) AS running_total
FROM sales
ORDER BY category, sale_date;

-- Compare to category average
SELECT
    category,
    product,
    amount,
    AVG(amount) OVER (PARTITION BY category) AS category_avg,
    amount - AVG(amount) OVER (PARTITION BY category) AS diff_from_avg
FROM sales;

-- Percent rank within group
SELECT
    category,
    product,
    amount,
    ROUND(PERCENT_RANK() OVER (PARTITION BY category ORDER BY amount), 2) AS pct_rank
FROM sales;

-- Median using window function
SELECT
    category,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) OVER (PARTITION BY category) AS median
FROM sales;

20.8 DISTINCT with Window Functions

-- Calculate aggregate once, use multiple times
SELECT DISTINCT
    category,
    SUM(amount) OVER (PARTITION BY category) AS category_sum,
    AVG(amount) OVER (PARTITION BY category) AS category_avg,
    COUNT(*) OVER (PARTITION BY category) AS category_count
FROM sales;

-- This is cleaner than:
SELECT
    category,
    SUM(amount) AS category_sum,
    AVG(amount) AS category_avg,
    COUNT(*) AS category_count
FROM sales
GROUP BY category;

20.9 Best Practices

                    Window Function Best Practices
    ========================================================================

    1. Always Define the Window
       ┌──────────────────────────────────────────────────────────────┐
       │  • OVER() for entire result set                            │
       │  • OVER (PARTITION BY ...) for groups                      │
       │  • OVER (ORDER BY ...) for sequential processing           │
       └──────────────────────────────────────────────────────────────┘

    2. Use PARTITION BY for Group-Level Calculations
       ┌──────────────────────────────────────────────────────────────┐
       │  • Similar to GROUP BY but preserves all rows              │
       │  • Enables comparisons within groups                        │
       └──────────────────────────────────────────────────────────────┘

    3. Choose Correct Frame
       ┌──────────────────────────────────────────────────────────────┐
       │  • ROWS for position-based frames                           │
       │  • RANGE for value-based frames                             │
       │  • Default is often RANGE UNBOUNDED PRECEDING               │
       └──────────────────────────────────────────────────────────────┘

    4. Use QUALIFY for Filtering Window Results
       ┌──────────────────────────────────────────────────────────────┐
       │  PostgreSQL 13+:                                             │
       │  SELECT *, ROW_NUMBER() OVER (...) AS rn                    │
       │  FROM (...)                                                 │
       │  WHERE rn = 1;                                             │
       │                                                              │
       │  Can be simplified to:                                      │
       │  SELECT * FROM (...)                                        │
       │  QUALIFY ROW_NUMBER() OVER (...) = 1;                       │
       └──────────────────────────────────────────────────────────────┘

    5. Consider Performance
       ┌──────────────────────────────────────────────────────────────┘
       │  • Window functions can be expensive on large datasets     │
       │  • Create indexes on ORDER BY columns                       │
       │  • Use EXPLAIN ANALYZE to check performance                │
       └──────────────────────────────────────────────────────────────┘
    ========================================================================

Next Chapter

Chapter 21: Understanding Indexes

Last Updated: February 2026