Indexes
Chapter 21: Understanding Indexes
Section titled “Chapter 21: Understanding Indexes”Database Performance Fundamentals
Section titled “Database Performance Fundamentals”21.1 Index Fundamentals
Section titled “21.1 Index Fundamentals”Indexes are data structures that improve the speed of data retrieval operations on database tables.
How Indexes Work ========================================================================
Without Index (Full Table Scan): ================================
Table: users (1 million rows)
SELECT * FROM users WHERE email = 'john@example.com';
┌─────┬──────────┬─────────────────────┐ │ id │ name │ email │ ├─────┼──────────┼─────────────────────┤ │ 1 │ Alice │ alice@example.com │ ← Check │ 2 │ Bob │ bob@example.com │ ← Check │ ... │ ... │ ... │ ← Check │99999│ John │ john@example.com │ ← Found! └─────┴──────────┴─────────────────────┘
→ Must check every row in worst case → O(n) complexity
With Index: ==========
Index: idx_users_email (B-tree)
┌────────────────────────┐ │ B-Tree Structure │ └────────────────────────┘ / \ ... ... / \ / \ (a-d) (e-h) (i-l) (m-z) | | | | ┌─────────┐ ┌─────────┐ │email │ │email │ │pointer │ │pointer │ └─────────┘ └─────────┘
SELECT * FROM users WHERE email = 'john@example.com';
→ Navigate tree directly to email → O(log n) complexity → Very fast lookup21.2 Creating Indexes
Section titled “21.2 Creating Indexes”-- Sample tableCREATE TABLE users ( id SERIAL PRIMARY KEY, username VARCHAR(50) NOT NULL, email VARCHAR(255) NOT NULL, phone VARCHAR(20), status VARCHAR(20) DEFAULT 'active', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
-- Basic B-tree index (default)CREATE INDEX idx_users_email ON users(email);
-- Index with specific index typeCREATE INDEX idx_users_email_hash ON users USING HASH (email);
-- Composite index (multiple columns)CREATE INDEX idx_users_status_created ON users(status, created_at DESC);
-- Unique indexCREATE UNIQUE INDEX idx_users_username ON users(username);
-- Index with WHERE clause (partial index)CREATE INDEX idx_users_active ON users(email) WHERE status = 'active';
-- Index on expressionCREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Index on JSON fieldCREATE INDEX idx_users_data ON users USING GIN(data);
-- Index with custom collationCREATE INDEX idx_users_name_collate ON users(name COLLATE "en_US");21.3 Index Types
Section titled “21.3 Index Types”-- B-tree Index (default, most common)-- Best for: =, <, >, <=, >=, BETWEEN, LIKE
CREATE INDEX idx_users_email ON users(email); -- email = 'x'CREATE INDEX idx_users_id ON users(id); -- id < 100CREATE INDEX idx_users_created ON users(created_at); -- date range
-- Hash Index-- Best for: Simple equality (=) only
CREATE INDEX idx_users_phone_hash ON users USING HASH (phone);
-- GiST Index-- Best for: Geometric data, full-text search, range types
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);CREATE INDEX idx_events_range ON events USING GIST (time_range);
-- GIN Index-- Best for: Arrays, JSONB, full-text search
CREATE INDEX idx_products_tags ON products USING GIN (tags);CREATE INDEX idx_documents_content ON documents USING GIN (content gin_trgm_ops);
-- BRIN Index-- Best for: Large tables with sequential data
CREATE INDEX idx_logs_date ON logs USING BRIN (created_at);
-- SP-GiST Index-- Best for: Non-balanced data structures
CREATE INDEX idx_customers_region ON customers USING SPGIST (region);21.4 Analyzing Index Usage
Section titled “21.4 Analyzing Index Usage”-- Check if index exists\d users-- Shows: indexes, constraints, triggers
-- Get index informationSELECT indexname, indexdefFROM pg_indexesWHERE tablename = 'users';
-- Check index sizeSELECT pg_size_pretty(pg_relation_size('idx_users_email'));
-- Explain query to see index usageEXPLAINSELECT * FROM users WHERE email = 'john@example.com';
EXPLAIN ANALYZESELECT * FROM users WHERE email = 'john@example.com';
-- Find unused indexesSELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch, idx_scanFROM pg_stat_user_indexesWHERE idx_scan = 0AND indexname NOT LIKE '%pkey%';
-- Index usage statisticsSELECT schemaname || '.' || tablename AS table_name, indexname, idx_blks_read, idx_blks_hit, ROUND( 100.0 * idx_blks_hit / NULLIF(idx_blks_read + idx_blks_hit, 0) ) AS hit_rateFROM pg_statio_user_indexesORDER BY idx_blks_read DESC;21.5 Index Maintenance
Section titled “21.5 Index Maintenance”-- Reindex to rebuild fragmented indexREINDEX INDEX idx_users_email;
-- Reindex entire tableREINDEX TABLE users;
-- Concurrent index creation (no locks)-- Useful for production systemsCREATE INDEX CONCURRENTLY idx_users_phone ON users(phone);
-- Rebuild index with CONCURRENTLY (PostgreSQL 12+)REINDEX INDEX CONCURRENTLY idx_users_email;
-- Drop indexDROP INDEX idx_users_old;
-- Rename indexALTER INDEX idx_users_email RENAME TO idx_users_email_unique;
-- Disable index (for bulk loads)ALTER INDEX idx_users_email DISABLE;
-- Enable indexALTER INDEX idx_users_email ENABLE;21.6 Composite Indexes
Section titled “21.6 Composite Indexes”-- Create composite indexCREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Column order matters!-- For: WHERE user_id = 1 AND status = 'completed'-- Index (user_id, status) is optimal
-- For: WHERE status = 'completed'-- Partial index might be betterCREATE INDEX idx_orders_status ON orders(status)WHERE status = 'completed';
-- Covering index (INCLUDE)CREATE INDEX idx_users_email ON users(email) INCLUDE (username, phone);-- Allows index-only scans without fetching from table
-- Index selectivity-- High cardinality (many unique values): B-tree-- Low cardinality (few unique values): Consider partial index21.7 Partial Indexes
Section titled “21.7 Partial Indexes”-- Index only active usersCREATE INDEX idx_users_active_email ON users(email)WHERE status = 'active';
-- Index high-value ordersCREATE INDEX idx_orders_high_value ON orders(user_id)WHERE total > 1000;
-- Index recent recordsCREATE INDEX idx_orders_2024 ON orders(created_at)WHERE created_at >= '2024-01-01';
-- Multiple partial indexesCREATE INDEX idx_orders_pending ON orders(user_id)WHERE status = 'pending';
CREATE INDEX idx_orders_completed ON orders(user_id)WHERE status = 'completed';
-- Partial index for NULL valuesCREATE INDEX idx_users_phone ON users(phone)WHERE phone IS NOT NULL;21.8 Expression Indexes
Section titled “21.8 Expression Indexes”-- Index on function resultCREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Query using expression indexSELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Index on JSON fieldCREATE INDEX idx_users_data ON users USING GIN (data);
-- JSONB path indexCREATE INDEX idx_users_name ON users ((data->>'name'));
-- Index on type castCREATE INDEX idx_orders_date ON orders ((created_at::date));
-- Composite with expressionCREATE INDEX idx_users_name_email ON users (LOWER(username), email);21.9 Index Selection Guidelines
Section titled “21.9 Index Selection Guidelines” When to Create Indexes ========================================================================
✓ Index columns used in WHERE clauses ✓ Index foreign key columns ✓ Index columns used in JOIN conditions ✓ Index columns used in ORDER BY ✓ Index columns used in GROUP BY ✓ Index high-cardinality columns
✗ Don't index low-cardinality columns (e.g., boolean) ✗ Don't index frequently updated columns ✗ Don't index small tables (full scan is faster) ✗ Don't over-index (too many indexes slow writes)
Index Order Guidelines: =======================
1. Equality conditions first (column = value) 2. Range conditions last (column > value) 3. High-cardinality columns first
Example: WHERE status = 'active' AND created_at > '2024-01-01'
Best: INDEX (status, created_at) ```
---
## 21.10 Best Practices Index Best Practices========================================================================
1. Analyze Query Patterns First ┌──────────────────────────────────────────────────────────────┐ │ • Use EXPLAIN ANALYZE to identify slow queries │ │ • Check for sequential scans │ │ • Look for missing indexes │ └──────────────────────────────────────────────────────────────┘
2. Choose Right Index Type ┌──────────────────────────────────────────────────────────────┐ │ • B-tree: Most queries (default) │ │ • GIN: Arrays, JSONB, full-text │ │ • BRIN: Time-series, append-only data │ │ • Hash: Simple equality only │ └──────────────────────────────────────────────────────────────┘
3. Consider Composite Index Column Order ┌──────────────────────────────────────────────────────────────┐ │ • Put equality columns first │ │ • Put range/sort columns last │ │ • Consider query patterns │ └──────────────────────────────────────────────────────────────┘
4. Use Partial Indexes ┌──────────────────────────────────────────────────────────────┐ │ • Smaller and faster than full indexes │ │ • Perfect for common filters │ └──────────────────────────────────────────────────────────────┘
5. Monitor and Maintain ┌──────────────────────────────────────────────────────────────┐ │ • Check for unused indexes │ │ • Reindex periodically to reduce bloat │ │ • Consider index size vs. query improvement │ └──────────────────────────────────────────────────────────────┘========================================================================---
## Next Chapter
[Chapter 23: Query Execution Plans](./10_performance/23_explain_analyze.md)
---
*Last Updated: February 2026*