Query_optimization
Chapter 24: Query Optimization Techniques
Section titled “Chapter 24: Query Optimization Techniques”Making Your Queries Faster
Section titled “Making Your Queries Faster”24.1 Index Optimization Strategies
Section titled “24.1 Index Optimization Strategies”Indexes are the primary tool for query optimization. Understanding when and how to create them is essential.
Index Strategy Overview ========================================================================
┌─────────────────────────────────────────────────────────────────────┐ │ When to Use Indexes │ │ │ │ ✓ YES - WHERE clause columns │ │ ✓ YES - JOIN conditions (foreign keys) │ │ ✓ YES - ORDER BY columns │ │ ✓ YES - Columns in GROUP BY │ │ ✓ NO - Low cardinality (few unique values) │ │ ✓ NO - Frequently updated columns │ │ ✓ NO - Small tables (full scan is faster) │ │ │ └─────────────────────────────────────────────────────────────────────┘Creating Effective Indexes
Section titled “Creating Effective Indexes”-- Sample tableCREATE TABLE products ( product_id SERIAL PRIMARY KEY, category VARCHAR(50), name VARCHAR(200), price DECIMAL(10,2), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, status VARCHAR(20) DEFAULT 'active');
-- Single column index for WHERECREATE INDEX idx_products_category ON products(category);CREATE INDEX idx_products_status ON products(status);
-- Composite index for multi-column queries-- Order matters! Put high-selectivity columns firstCREATE INDEX idx_products_category_status ON products(category, status);
-- Partial index - only for active rowsCREATE INDEX idx_products_active ON products(category, name) WHERE status = 'active';
-- Covering index - includes all columns needed by queryCREATE INDEX idx_products_covering ON products(product_id, category, name, price) INCLUDE (created_at);
-- Expression indexCREATE INDEX idx_products_lower_name ON products(LOWER(name));
-- Unique indexCREATE UNIQUE INDEX idx_products_sku ON products(category, sku);24.2 Query Rewriting Techniques
Section titled “24.2 Query Rewriting Techniques”Sometimes the best optimization is rewriting the query itself.
-- Instead of:SELECT * FROM ordersWHERE status = 'completed' AND total > 1000;
-- Use covering index and SELECT only needed columns:CREATE INDEX idx_orders_status_total ON orders(status, total) INCLUDE (user_id, order_date);
SELECT order_id, user_id, total, order_dateFROM ordersWHERE status = 'completed' AND total > 1000;
-- Instead of multiple OR conditions:SELECT * FROM productsWHERE category = 'Electronics' OR category = 'Clothing' OR category = 'Books';
-- Use IN:SELECT * FROM productsWHERE category IN ('Electronics', 'Clothing', 'Books');
-- Instead of NOT IN (can be slow):SELECT * FROM usersWHERE user_id NOT IN (SELECT user_id FROM orders);
-- Use NOT EXISTS or LEFT JOIN:SELECT * FROM users uWHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id);
-- Or use LEFT JOIN with NULL check:SELECT u.* FROM users uLEFT JOIN orders o ON u.user_id = o.user_idWHERE o.order_id IS NULL;24.3 JOIN Optimization
Section titled “24.3 JOIN Optimization”-- Optimizing join order-- Put smallest/deterministic tables firstEXPLAIN (ANALYZE)SELECT u.*, o.*FROM orders oINNER JOIN users u ON o.user_id = u.user_idWHERE u.status = 'active';
-- Use appropriate join type-- For small outer table: Nested Loop-- For large tables: Hash Join or Merge Join
-- Avoid SELECT * in joins - specify columnsSELECT u.username, o.order_id, o.totalFROM users uINNER JOIN orders o ON u.user_id = o.user_id;
-- Use STRAIGHT_JOIN hint (if needed) to force join orderSELECT STRAIGHT_JOIN u.username, o.order_idFROM users uINNER JOIN orders o ON u.user_id = o.user_id;24.4 Pagination Optimization
Section titled “24.4 Pagination Optimization”Pagination with OFFSET can be slow for large offsets.
-- Slow: Using OFFSETSELECT * FROM ordersORDER BY order_idLIMIT 10 OFFSET 10000;
-- Better: Keyset pagination (seek method)-- Instead of OFFSET, use WHERE with last seen valueSELECT * FROM ordersWHERE order_id > 10000 -- Last order_id from previous pageORDER BY order_idLIMIT 10;
-- For non-sequential pagination, use covering indexesCREATE INDEX idx_orders_pagination ON orders(status, order_id DESC);
-- Then paginate:SELECT * FROM ordersWHERE status = 'completed'ORDER BY order_id DESCLIMIT 10 OFFSET 1000;24.5 Aggregation Optimization
Section titled “24.5 Aggregation Optimization”-- Use materialized views for frequent aggregationsCREATE MATERIALIZED VIEW sales_daily_summary ASSELECT DATE_TRUNC('day', created_at) as day, COUNT(*) as order_count, SUM(total) as revenueFROM ordersGROUP BY DATE_TRUNC('day', created_at);
-- Refresh periodicallyREFRESH MATERIALIZED VIEW CONCURRENTLY sales_daily_summary;
-- Use approximate aggregates for large datasets-- Instead of COUNT(DISTINCT):SELECT approximate_count_distinct(user_id) FROM orders;
-- Use tablesample for approximate resultsSELECT * FROM orders TABLESAMPLE BERNOULLI(10);
-- Pre-aggregate in application for real-time dashboards24.6 Subquery Optimization
Section titled “24.6 Subquery Optimization”-- Correlated subquery (executed per row):SELECT p.name, (SELECT SUM(quantity) FROM orders WHERE product_id = p.product_id) as total_soldFROM products p;
-- Convert to JOIN (usually faster):SELECT p.name, COALESCE(o.total_sold, 0) as total_soldFROM products pLEFT JOIN ( SELECT product_id, SUM(quantity) as total_sold FROM orders GROUP BY product_id) o ON p.product_id = o.product_id;
-- Use CTEs for readability (PostgreSQL optimizes them):WITH product_sales AS ( SELECT product_id, SUM(quantity) as total_sold FROM orders GROUP BY product_id)SELECT p.name, COALESCE(ps.total_sold, 0)FROM products pLEFT JOIN product_sales ps ON p.product_id = ps.product_id;24.7 Configuration Parameters
Section titled “24.7 Configuration Parameters”-- Important settings for query performance
-- shared_buffers: Memory for caching table data-- Should be ~25% of RAMSHOW shared_buffers;-- SET shared_buffers = '4GB';
-- work_mem: Memory for sorting/hashing per operation-- Increase for complex queriesSHOW work_mem;-- SET work_mem = '256MB';
-- maintenance_work_mem: For VACUUM, CREATE INDEX, etc.SHOW maintenance_work_mem;-- SET maintenance_work_mem = '1GB';
-- effective_cache_size: Planner's assumption about cache-- Set to expected cache sizeSHOW effective_cache_size;-- SET effective_cache_size = '12GB';
-- random_page_cost: Cost of non-sequential disk read-- Lower for SSDsSHOW random_page_cost;-- SET random_page_cost = 1.1;
-- Check current settingsSELECT name, setting, unit FROM pg_settingsWHERE name IN ('shared_buffers', 'work_mem', 'maintenance_work_mem', 'effective_cache_size', 'random_page_cost');24.8 Query Analysis Checklist
Section titled “24.8 Query Analysis Checklist” Query Optimization Checklist ========================================================================
Before Running: ──────────────── □ Do you SELECT only needed columns? □ Are WHERE conditions sargable? □ Is the join order optimal? □ Can you add appropriate indexes?
After EXPLAIN ANALYZE: ─────────────────────── □ Any sequential scans on large tables? □ Are row estimates accurate? □ Is join type appropriate? □ Are sorts using memory or disk? □ Any expensive operations that can be avoided?
Advanced Checks: ───────────────── □ Consider partitioning for large tables □ Review buffer hit ratio □ Check for missing foreign key indexes □ Review statistics freshness24.9 Practical Example: Step-by-Step Optimization
Section titled “24.9 Practical Example: Step-by-Step Optimization”-- Step 1: Original slow queryEXPLAIN (ANALYZE, TIMING)SELECT u.username, o.order_id, o.total, p.name as product_nameFROM users uINNER JOIN orders o ON u.user_id = o.user_idINNER JOIN order_items oi ON o.order_id = oi.order_idINNER JOIN products p ON oi.product_id = p.product_idWHERE o.created_at >= '2024-01-01' AND u.status = 'active'ORDER BY o.total DESCLIMIT 100;
-- Issues identified:-- - Sequential scans-- - Large join result before LIMIT
-- Step 2: Add indexesCREATE INDEX idx_orders_user_date ON orders(user_id, created_at);CREATE INDEX idx_orders_status_date ON orders(status, created_at);CREATE INDEX idx_order_items_order ON order_items(order_id);CREATE INDEX idx_products_id_name ON products(product_id, name);
-- Step 3: Rewrite queryEXPLAIN (ANALYZE, TIMING)SELECT u.username, o.order_id, oi.quantity * oi.unit_price as total, p.name as product_nameFROM ( SELECT order_id, user_id, quantity * unit_price as total FROM order_items) oiINNER JOIN ( SELECT order_id, user_id, total FROM orders WHERE created_at >= '2024-01-01' AND status = 'completed') o ON oi.order_id = o.order_idINNER JOIN users u ON o.user_id = u.user_idINNER JOIN products p ON oi.product_id = p.product_idWHERE u.status = 'active'ORDER BY oi.quantity * oi.unit_price DESCLIMIT 100;Summary
Section titled “Summary”| Technique | When to Use |
|---|---|
| Index creation | WHERE/JOIN/ORDER BY columns |
| Query rewriting | Avoid subqueries, use IN not OR |
| Pagination | Use keyset, not OFFSET |
| CTEs | Readability, can be optimized |
| Configuration | Tune memory parameters |