Group_by
Chapter 18: GROUP BY & Aggregation
Section titled “Chapter 18: GROUP BY & Aggregation”Summarizing and Analyzing Data
Section titled “Summarizing and Analyzing Data”18.1 Aggregation Fundamentals
Section titled “18.1 Aggregation Fundamentals”Aggregation functions perform calculations on sets of rows and return a single result.
Aggregation Process ========================================================================
┌─────────────────────────────────────────────────────────────────────┐ │ GROUP BY Processing │ │ │ │ Original Data: │ │ ┌──────┬────────────┬───────┐ │ │ │ user │ category │ amount│ │ │ ├──────┼────────────┼───────┤ │ │ │ A │ Electronics│ 100 │ │ │ │ A │ Electronics│ 50 │────► GROUP BY user, category │ │ │ A │ Books │ 30 │────► Aggregates per group │ │ │ B │ Electronics│ 80 │────► │ │ │ B │ Books │ 40 │────► Results: │ │ └──────┴────────────┴───────┘ ┌──────┬────────────┬────────┐ │ │ │ user │ category │ SUM() │ │ │ ├──────┼────────────┼────────┤ │ │ │ A │ Electronics│ 150 │ │ │ │ A │ Books │ 30 │ │ │ │ B │ Electronics│ 80 │ │ │ │ B │ Books │ 40 │ │ │ └──────┴────────────┴────────┘ │ └─────────────────────────────────────────────────────────────────────┘18.2 Basic Aggregate Functions
Section titled “18.2 Basic Aggregate Functions”-- Sample dataCREATE TABLE sales ( id SERIAL, product VARCHAR(100), category VARCHAR(50), amount DECIMAL(10,2), sale_date DATE, region VARCHAR(20));
INSERT INTO sales (product, category, amount, sale_date, region) VALUES ('Laptop', 'Electronics', 999.99, '2024-01-15', 'North'), ('Mouse', 'Electronics', 29.99, '2024-01-16', 'North'), ('Book', 'Books', 19.99, '2024-01-17', 'South'), ('Laptop', 'Electronics', 1199.99, '2024-01-18', 'South'), ('Shirt', 'Clothing', 39.99, '2024-01-19', 'North'), ('Book', 'Books', 24.99, '2024-01-20', 'North');
-- COUNT - count rowsSELECT COUNT(*) FROM sales; -- Total rowsSELECT COUNT(DISTINCT product) FROM sales; -- Unique products
-- SUM - total sumSELECT SUM(amount) FROM sales;SELECT SUM(amount) FROM sales WHERE category = 'Electronics';
-- AVG - averageSELECT AVG(amount) FROM sales;SELECT category, AVG(amount) FROM sales GROUP BY category;
-- MIN and MAXSELECT MIN(amount), MAX(amount) FROM sales;SELECT category, MIN(amount) AS min_sale, MAX(amount) AS max_sale, AVG(amount) AS avg_saleFROM salesGROUP BY category;
-- Multiple aggregationsSELECT COUNT(*) AS total_sales, SUM(amount) AS total_revenue, AVG(amount) AS avg_sale, MIN(amount) AS min_sale, MAX(amount) AS max_saleFROM sales;18.3 GROUP BY
Section titled “18.3 GROUP BY”-- Basic GROUP BYSELECT category, SUM(amount) FROM sales GROUP BY category;
-- GROUP BY multiple columnsSELECT category, region, SUM(amount)FROM salesGROUP BY category, region;
-- GROUP BY with multiple aggregationsSELECT category, region, COUNT(*) AS transaction_count, SUM(amount) AS total_sales, AVG(amount) AS avg_sale, MIN(amount) AS min_sale, MAX(amount) AS max_saleFROM salesGROUP BY category, regionORDER BY category, region;
-- GROUP BY with string functionsSELECT EXTRACT(YEAR FROM sale_date) AS year, EXTRACT(MONTH FROM sale_date) AS month, SUM(amount) AS total_salesFROM salesGROUP BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)ORDER BY year, month;
-- GROUP BY with filtering before aggregationSELECT region, SUM(amount)FROM salesWHERE sale_date >= '2024-01-18'GROUP BY region;18.4 HAVING
Section titled “18.4 HAVING”The HAVING clause filters groups after aggregation.
-- HAVING vs WHERE-- WHERE filters rows BEFORE aggregation-- HAVING filters groups AFTER aggregation
-- Find categories with total sales > 100SELECT category, SUM(amount) AS totalFROM salesGROUP BY categoryHAVING SUM(amount) > 100;
-- Find products sold more than onceSELECT product, COUNT(*) AS times_soldFROM salesGROUP BY productHAVING COUNT(*) > 1;
-- Complex HAVING with multiple conditionsSELECT region, category, SUM(amount) AS total, COUNT(*) AS transactionsFROM salesGROUP BY region, categoryHAVING SUM(amount) > 50 AND COUNT(*) >= 2ORDER BY total DESC;
-- HAVING with subquerySELECT category, SUM(amount) AS totalFROM salesGROUP BY categoryHAVING SUM(amount) > ( SELECT AVG(total) FROM (SELECT SUM(amount) AS total FROM sales GROUP BY category) sub);18.5 GROUPING SETS
Section titled “18.5 GROUPING SETS”Generate multiple grouping levels in one query.
-- GROUP BY ROLLUP - hierarchical subtotalsSELECT region, category, SUM(amount) AS totalFROM salesGROUP BY ROLLUP (region, category);
-- Equivalent to:-- GROUP BY region, category-- UNION ALL-- GROUP BY region-- UNION ALL-- GROUP BY () (grand total)
-- GROUP BY CUBE - all combinationsSELECT region, category, SUM(amount) AS totalFROM salesGROUP BY CUBE (region, category);
-- Equivalent to all combinations:-- region, category-- region, NULL-- NULL, category-- NULL, NULL (grand total)
-- GROUPING() function to identify grouped rowsSELECT GROUPING(region) AS is_region_grouped, GROUPING(category) AS is_category_grouped, region, category, SUM(amount) AS totalFROM salesGROUP BY ROLLUP (region, category);
-- 0 = grouped by this column-- 1 = not grouped by this column (subtotal row)18.6 FILTER Clause
Section titled “18.6 FILTER Clause”Use FILTER to aggregate only certain rows.
-- Using FILTER instead of CASESELECT category, SUM(amount) AS total, SUM(amount) FILTER (WHERE region = 'North') AS north_sales, SUM(amount) FILTER (WHERE region = 'South') AS south_sales, COUNT(*) AS total_transactions, COUNT(*) FILTER (WHERE amount > 100) AS high_value_transactionsFROM salesGROUP BY category;
-- Multiple FILTER conditionsSELECT category, SUM(amount) FILTER (WHERE sale_date >= '2024-01-18') AS recent_sales, COUNT(*) FILTER (WHERE amount > 50) AS premium_salesFROM salesGROUP BY category;
-- FILTER with other aggregatesSELECT region, AVG(amount) AS avg_amount, AVG(amount) FILTER (WHERE category = 'Electronics') AS avg_electronics, AVG(amount) FILTER (WHERE category = 'Books') AS avg_booksFROM salesGROUP BY region;18.7 DISTINCT with Aggregates
Section titled “18.7 DISTINCT with Aggregates”-- COUNT DISTINCTSELECT COUNT(*) AS total_rows, COUNT(DISTINCT product) AS unique_products, COUNT(DISTINCT category) AS unique_categories, COUNT(DISTINCT region) AS unique_regionsFROM sales;
-- SUM DISTINCTSELECT SUM(DISTINCT amount) AS sum_unique_amountsFROM sales;
-- Multiple distinct aggregatesSELECT COUNT(DISTINCT product) AS products, COUNT(DISTINCT region) AS regions, COUNT(DISTINCT category) AS categoriesFROM salesWHERE amount > 20;
-- DISTINCT vs GROUP BY-- These are equivalent:SELECT category, SUM(amount) FROM sales GROUP BY category;SELECT SUM(amount) FILTER (WHERE category = 'Books') FROM sales;18.8 Combining Aggregates
Section titled “18.8 Combining Aggregates”-- Aggregate of aggregatesSELECT category, SUM(amount) AS total, AVG(SUM(amount)) OVER () AS grand_avgFROM salesGROUP BY category;
-- Running total with window function (for comparison)SELECT sale_date, amount, SUM(amount) OVER (ORDER BY sale_date) AS running_total, SUM(amount) OVER () AS grand_totalFROM sales;
-- Percentage of totalSELECT category, SUM(amount) AS amount, ROUND( SUM(amount)::numeric / SUM(SUM(amount)) OVER() * 100, 2 ) AS percentageFROM salesGROUP BY categoryORDER BY percentage DESC;
-- Aggregates in subqueriesSELECT category, (SELECT SUM(amount) FROM sales s2 WHERE s2.category = s1.category) AS totalFROM (SELECT DISTINCT category FROM sales) s1;18.9 Practical Aggregation Examples
Section titled “18.9 Practical Aggregation Examples”-- Monthly sales summarySELECT TO_CHAR(sale_date, 'YYYY-MM') AS month, COUNT(*) AS transactions, SUM(amount) AS revenue, AVG(amount) AS avg_transaction, MIN(amount) AS min_sale, MAX(amount) AS max_saleFROM salesGROUP BY TO_CHAR(sale_date, 'YYYY-MM')ORDER BY month;
-- Top products by categorySELECT category, product, SUM(amount) AS totalFROM salesGROUP BY category, productQUALIFY ROW_NUMBER() OVER (PARTITION BY category ORDER BY SUM(amount) DESC) <= 2ORDER BY category, total DESC;
-- Customer purchase analysisWITH customer_sales AS ( SELECT customer_id, SUM(amount) AS total_spent, COUNT(*) AS order_count, MAX(created_at) AS last_order FROM orders GROUP BY customer_id)SELECT CASE WHEN total_spent > 1000 THEN 'VIP' WHEN total_spent > 500 THEN 'Premium' ELSE 'Regular' END AS customer_tier, COUNT(*) AS customers, SUM(total_spent) AS revenue, AVG(order_count) AS avg_ordersFROM customer_salesGROUP BY CASE WHEN total_spent > 1000 THEN 'VIP' WHEN total_spent > 500 THEN 'Premium' ELSE 'Regular' END;18.10 Best Practices
Section titled “18.10 Best Practices” Aggregation Best Practices ========================================================================
1. Use FILTER Instead of CASE ┌──────────────────────────────────────────────────────────────┐ │ ✓ SELECT SUM(amount) FILTER (WHERE status = 'completed') │ │ ✗ SELECT SUM(CASE WHEN status = 'completed' THEN amount) │ └──────────────────────────────────────────────────────────────┘
2. Use GROUP BY ROLLUP for Hierarchical Summaries ┌──────────────────────────────────────────────────────────────┐ │ • Generate subtotals automatically │ │ • One query instead of multiple UNION ALL │ └──────────────────────────────────────────────────────────────┘
3. Always Include Non-Aggregated Columns in GROUP BY ┌──────────────────────────────────────────────────────────────┐ │ • PostgreSQL requires this │ │ • Ensures correct grouping │ └──────────────────────────────────────────────────────────────┘
4. Filter Before Grouping When Possible ┌──────────────────────────────────────────────────────────────┐ │ • Use WHERE to filter rows before aggregation │ │ • More efficient than HAVING │ └──────────────────────────────────────────────────────────────┘
5. Use COALESCE for NULL Handling ┌──────────────────────────────────────────────────────────────┐ │ • COALESCE(SUM(amount), 0) prevents NULL in results │ └──────────────────────────────────────────────────────────────┘ ========================================================================Next Chapter
Section titled “Next Chapter”Last Updated: February 2026