Subqueries
Chapter 15: Subqueries
Section titled “Chapter 15: Subqueries”Mastering Subqueries in PostgreSQL
Section titled “Mastering Subqueries in PostgreSQL”15.1 Subquery Fundamentals
Section titled “15.1 Subquery Fundamentals”A subquery is a query nested inside another SQL statement. They provide powerful ways to perform complex data operations.
Subquery Types ========================================================================
┌─────────────────────────────────────────────────────────────────────┐ │ Subquery Types │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Scalar │ │ Table │ │ │ │ Subquery │ │ Subquery │ │ │ │ │ │ │ │ │ │ Returns single │ │ Returns multiple │ │ │ │ value │ │ rows/columns │ │ │ │ │ │ (used as table) │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Correlated │ │ EXISTS │ │ │ │ Subquery │ │ Subquery │ │ │ │ │ │ │ │ │ │ References outer │ │ Checks for row │ │ │ │ query columns │ │ existence │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘
Where Subqueries Can Be Used: ============================
• SELECT clause (scalar subquery) • FROM clause (derived table) • WHERE clause (scalar or IN/ANY/ALL) • HAVING clause • INSERT/UPDATE/DELETE VALUES • CREATE TABLE AS15.2 Scalar Subqueries
Section titled “15.2 Scalar Subqueries”Return a single value, used in expressions.
-- Sample tablesCREATE TABLE orders ( id SERIAL PRIMARY KEY, user_id INTEGER, total DECIMAL(10,2), status VARCHAR(20), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
CREATE TABLE users ( id SERIAL PRIMARY KEY, username VARCHAR(50), email VARCHAR(255));
INSERT INTO users (username, email) VALUES ('john', 'john@example.com'), ('jane', 'jane@example.com'), ('bob', 'bob@example.com');
INSERT INTO orders (user_id, total, status) VALUES (1, 150.00, 'completed'), (1, 75.00, 'pending'), (2, 200.00, 'completed'), (2, 50.00, 'completed'), (3, 100.00, 'cancelled');
-- Scalar subquery in SELECTSELECT username, (SELECT COUNT(*) FROM orders WHERE user_id = users.id) AS order_countFROM users;
-- Scalar subquery for comparisonSELECT username, total AS order_total, (SELECT AVG(total) FROM orders) AS avg_orderFROM users uJOIN orders o ON u.id = o.user_id;
-- Scalar subquery in WHERESELECT * FROM ordersWHERE total > (SELECT AVG(total) FROM orders);
-- Multiple scalar subqueriesSELECT u.username, (SELECT MAX(total) FROM orders WHERE user_id = u.id) AS max_order, (SELECT MIN(total) FROM orders WHERE user_id = u.id) AS min_order, (SELECT COUNT(*) FROM orders WHERE user_id = u.id) AS order_countFROM users u;15.3 Table Subqueries (Derived Tables)
Section titled “15.3 Table Subqueries (Derived Tables)”Return a result set used as a temporary table.
-- Basic derived tableSELECT sub.category, sub.order_count, sub.totalFROM ( SELECT u.username AS category, COUNT(o.id) AS order_count, SUM(o.total) AS total FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.username) subWHERE sub.order_count > 1;
-- Derived table with multiple columnsSELECT category, MAX(order_count) as max_orders, AVG(total) as avg_totalFROM ( SELECT u.username AS category, u.email, COUNT(o.id) AS order_count, AVG(o.total) AS total FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.username, u.email) subGROUP BY category;
-- Complex derived tableSELECT product_category, product_count, total_value, CASE WHEN product_count > 5 THEN 'High Volume' WHEN product_count > 2 THEN 'Medium Volume' ELSE 'Low Volume' END AS category_typeFROM ( SELECT 'Products' AS product_category, COUNT(*) AS product_count, SUM(price) AS total_value FROM products WHERE price > 50) sub;15.4 IN and NOT IN with Subqueries
Section titled “15.4 IN and NOT IN with Subqueries”Filter rows based on membership in a result set.
-- IN with subquerySELECT * FROM ordersWHERE user_id IN ( SELECT id FROM users WHERE username IN ('john', 'jane'));
-- NOT IN with subquerySELECT * FROM ordersWHERE user_id NOT IN ( SELECT id FROM users WHERE username = 'bob');
-- IN with multiple columnsSELECT * FROM ordersWHERE (user_id, status) IN ( SELECT id, 'completed' FROM users WHERE email LIKE '%@example.com');
-- IN with correlated subquerySELECT * FROM users uWHERE id IN ( SELECT user_id FROM orders WHERE total > 100);
-- Performance consideration: IN vs EXISTS-- Both are often equivalent in PostgreSQL, but test with EXPLAIN15.5 ANY and ALL Operators
Section titled “15.5 ANY and ALL Operators”Compare to any or all values returned by subquery.
-- ANY: TRUE if comparison is TRUE for ANY valueSELECT * FROM ordersWHERE total > ANY ( SELECT total FROM orders WHERE user_id = 1);
-- Same as:SELECT * FROM ordersWHERE total > ( SELECT MIN(total) FROM orders WHERE user_id = 1);
-- ALL: TRUE if comparison is TRUE for ALL valuesSELECT * FROM ordersWHERE total > ALL ( SELECT total FROM orders WHERE user_id = 1);
-- Same as:SELECT * FROM ordersWHERE total > ( SELECT MAX(total) FROM orders WHERE user_id = 1);
-- Practical example: Products more expensive than any ElectronicsSELECT * FROM productsWHERE price > ANY ( SELECT price FROM products WHERE category = 'Electronics');
-- Products more expensive than all Electronics productsSELECT * FROM productsWHERE price > ALL ( SELECT price FROM products WHERE category = 'Electronics');15.6 EXISTS and NOT EXISTS
Section titled “15.6 EXISTS and NOT EXISTS”Check if subquery returns any rows.
-- EXISTS: Find users with ordersSELECT * FROM users uWHERE EXISTS ( SELECT 1 FROM orders WHERE user_id = u.id);
-- NOT EXISTS: Find users without ordersSELECT * FROM users uWHERE NOT EXISTS ( SELECT 1 FROM orders WHERE user_id = u.id);
-- EXISTS with multiple conditionsSELECT * FROM users uWHERE EXISTS ( SELECT 1 FROM orders WHERE user_id = u.id AND status = 'completed');
-- NOT EXISTS for finding "latest" recordsSELECT * FROM orders o1WHERE NOT EXISTS ( SELECT 1 FROM orders o2 WHERE o2.user_id = o1.user_id AND o2.created_at > o1.created_at);
-- EXISTS vs IN: When to use each-- EXISTS: Better when checking existence (often faster)-- IN: Better when comparing to small, static list15.7 Subqueries in FROM Clause
Section titled “15.7 Subqueries in FROM Clause”-- Nested derived tablesSELECT category, COUNT(*) AS count, SUM(total) AS totalFROM ( SELECT u.username AS category, o.total FROM users u LEFT JOIN orders o ON u.id = o.user_id) subGROUP BY category;
-- Multiple derived tablesSELECT sub1.category, sub1.total AS completed_total, sub2.total AS cancelled_totalFROM ( SELECT u.username AS category, SUM(o.total) AS total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.status = 'completed' GROUP BY u.username) sub1JOIN ( SELECT u.username AS category, SUM(o.total) AS total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.status = 'cancelled' GROUP BY u.username) sub2 ON sub1.category = sub2.category;
-- LATERAL subquery (PostgreSQL 9.3+)-- Allows subquery to reference columns from preceding tablesSELECT u.username, o.*FROM users uLEFT JOIN LATERAL ( SELECT * FROM orders WHERE user_id = u.id ORDER BY created_at DESC LIMIT 3) o ON TRUE;15.8 Subqueries in WHERE and HAVING
Section titled “15.8 Subqueries in WHERE and HAVING”-- Subquery in WHERESELECT * FROM productsWHERE category_id = ( SELECT id FROM categories WHERE name = 'Electronics');
-- Subquery in HAVINGSELECT user_id, COUNT(*) AS order_countFROM ordersGROUP BY user_idHAVING COUNT(*) > ( SELECT AVG(order_count) FROM (SELECT COUNT(*) AS order_count FROM orders GROUP BY user_id) sub);
-- Subquery with aggregate in HAVINGSELECT status, AVG(total) AS avg_totalFROM ordersGROUP BY statusHAVING AVG(total) > ( SELECT AVG(total) FROM orders WHERE status = 'pending');
-- Complex HAVING with subquerySELECT u.username, COUNT(o.id) AS total_orders, SUM(o.total) AS total_spentFROM users uJOIN orders o ON u.id = o.user_idGROUP BY u.usernameHAVING SUM(o.total) > ( SELECT AVG(total) FROM orders WHERE status = 'completed');15.9 Subqueries in DML Statements
Section titled “15.9 Subqueries in DML Statements”-- INSERT with subqueryINSERT INTO order_stats (user_id, total_orders, total_spent)SELECT user_id, COUNT(*) AS total_orders, SUM(total) AS total_spentFROM ordersWHERE status = 'completed'GROUP BY user_id;
-- UPDATE with subqueryUPDATE products p SET price = price * 1.1WHERE p.category_id IN ( SELECT id FROM categories WHERE name IN ('Electronics', 'Toys'));
-- DELETE with subqueryDELETE FROM ordersWHERE user_id IN ( SELECT id FROM users WHERE username = 'inactive_user');
-- UPDATE with correlated subqueryUPDATE products p SET price = ( SELECT AVG(price) FROM products WHERE category_id = p.category_id )WHERE p.price > ( SELECT AVG(price) * 1.5 FROM products WHERE category_id = p.category_id);15.10 Performance Considerations
Section titled “15.10 Performance Considerations”-- Always EXPLAIN your queries with subqueriesEXPLAIN ANALYZESELECT * FROM ordersWHERE user_id IN (SELECT id FROM users WHERE is_active = TRUE);
-- Convert IN to EXISTS often betterEXPLAIN ANALYZESELECT * FROM orders oWHERE EXISTS (SELECT 1 FROM users u WHERE u.id = o.user_id AND u.is_active);
-- Use LATERAL for better performance in some casesEXPLAIN ANALYZESELECT u.username, o.*FROM users uLEFT JOIN LATERAL ( SELECT * FROM orders WHERE user_id = u.id LIMIT 1) o ON TRUE;
-- Consider materialization-- PostgreSQL may materialize subqueries for performance-- Check query plan for materialize nodes15.11 Best Practices
Section titled “15.11 Best Practices” Subquery Best Practices ========================================================================
1. Use EXISTS for Existence Checks ┌──────────────────────────────────────────────────────────────┐ │ ✓ SELECT * FROM users u │ │ WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u│ │ │ │ ✗ SELECT * FROM users u │ │ WHERE u.id IN (SELECT user_id FROM orders) │ └──────────────────────────────────────────────────────────────┘
2. Avoid Unnecessary Nested Subqueries ┌──────────────────────────────────────────────────────────────┐ │ ✗ Complex nested subqueries can be slow │ │ ✓ Use CTEs (WITH clause) for readability │ └──────────────────────────────────────────────────────────────┘
3. Index Subquery Conditions ┌──────────────────────────────────────────────────────────────┐ │ • Add indexes on columns used in subquery WHERE clauses │ │ • This dramatically improves performance │ └──────────────────────────────────────────────────────────────┘
4. Consider JOIN Instead of Subquery ┌──────────────────────────────────────────────────────────────┐ │ • Sometimes JOIN is more efficient than subquery │ │ • Test with EXPLAIN ANALYZE │ └──────────────────────────────────────────────────────────────┘
5. Use LATERAL for Complex Requirements ┌──────────────────────────────────────────────────────────────┐ │ • LATERAL allows subqueries to reference outer query cols │ │ • Useful for top-N per group and similar patterns │ └──────────────────────────────────────────────────────────────┘ ========================================================================Next Chapter
Section titled “Next Chapter”Chapter 17: Common Table Expressions (CTEs)
Last Updated: February 2026