Performance_monitoring
Chapter 25: Performance Monitoring
Section titled “Chapter 25: Performance Monitoring”Monitoring and Diagnosing PostgreSQL Performance
Section titled “Monitoring and Diagnosing PostgreSQL Performance”25.1 System Views for Monitoring
Section titled “25.1 System Views for Monitoring”PostgreSQL provides extensive system views for monitoring performance.
Key Monitoring Views ========================================================================
┌─────────────────────────────────────────────────────────────────────┐ │ PostgreSQL Monitoring Views │ │ │ │ Activity & Connections: │ │ ──────────────────────── │ │ • pg_stat_activity → Current connections and queries │ │ • pg_stat_database → Database-level statistics │ │ • pg_stat_user_tables → Table access statistics │ │ • pg_stat_user_indexes → Index usage statistics │ │ │ │ Performance: │ │ ───────────── │ │ • pg_stat_statements → Query performance tracking │ │ • pg_stat_io → I/O statistics │ │ • pg_stat_bgwriter → Background writer stats │ │ • pg_statio_user_tables → Table I/O statistics │ │ │ │ Locks: │ │ ────── │ │ • pg_locks → Current locks held │ │ • pg_blocking_pids → Blocking processes │ │ │ └─────────────────────────────────────────────────────────────────────┘Basic Monitoring Queries
Section titled “Basic Monitoring Queries”-- Current active connectionsSELECT pid, usename, application_name, client_addr, state, query, wait_event_type, wait_event, query_startFROM pg_stat_activityWHERE state != 'idle'ORDER BY query_start;
-- Long running queriesSELECT pid, usename, query, state, NOW() - query_start as duration, wait_eventFROM pg_stat_activityWHERE state != 'idle' AND query_start < NOW() - INTERVAL '5 minutes'ORDER BY query_start;
-- Database statisticsSELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit, ROUND(100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0), 2) as cache_hit_ratio, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deletedFROM pg_stat_databaseWHERE datname = current_database();25.2 Table and Index Statistics
Section titled “25.2 Table and Index Statistics”-- Table access patternsSELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, last_analyzeFROM pg_stat_user_tablesORDER BY seq_scan DESC;
-- Index usage statisticsSELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelid)) as index_sizeFROM pg_stat_user_indexesORDER BY idx_scan DESC;
-- Tables needing vacuum (high dead tuples)SELECT schemaname, relname, n_dead_tup, n_live_tup, ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_tuple_pct, last_autovacuum, last_vacuumFROM pg_stat_user_tablesWHERE n_dead_tup > 1000ORDER BY n_dead_tup DESC;25.3 Query Performance Tracking
Section titled “25.3 Query Performance Tracking”First, enable the pg_stat_statements extension:
-- Enable pg_stat_statementsCREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- View top queries by total timeSELECT query, calls, total_exec_time, mean_exec_time, max_exec_time, min_exec_time, rows, shared_blks_hit, shared_blks_read, local_blks_hit, temp_blks_read, temp_blks_writtenFROM pg_stat_statementsORDER BY total_exec_time DESCLIMIT 20;
-- Find most frequently called queriesSELECT query, calls, total_exec_time, ROUND(total_exec_time / calls, 2) as avg_time_ms, rowsFROM pg_stat_statementsORDER BY calls DESCLIMIT 20;
-- Find slowest individual executionsSELECT query, calls, mean_exec_time, max_exec_time, min_exec_time, stddev_exec_timeFROM pg_stat_statementsORDER BY max_exec_time DESCLIMIT 20;
-- Find queries with high I/OSELECT query, shared_blks_read, shared_blks_hit, ROUND(100.0 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0), 2) as hit_ratioFROM pg_stat_statementsORDER BY shared_blks_read DESCLIMIT 20;25.4 Lock Monitoring
Section titled “25.4 Lock Monitoring”-- Current locksSELECT l.locktype, l.relation::regclass, l.mode, l.granted, l.pid, l.mode, a.usename, a.queryFROM pg_locks lLEFT JOIN pg_stat_activity a ON l.pid = a.pidWHERE l.relation IS NOT NULLORDER BY l.relation, l.pid;
-- Blocked queries (waiting for locks)SELECT blocked.pid as blocked_pid, blocked.usename as blocked_user, blocked.query as blocked_query, blocking.pid as blocking_pid, blocking.usename as blocking_user, blocking.query as blocking_query, blocked.state as blocked_state, blocked.wait_event as blocked_wait, blocking.state as blocking_stateFROM pg_stat_activity blockedJOIN pg_locks l ON blocked.pid = l.pid AND NOT l.grantedJOIN pg_locks blocking ON l.relation = blocking.relation AND blocking.grantedJOIN pg_stat_activity blocking ON blocking.pid = blocking.pidWHERE blocked.pid != blocking.pid;
-- Wait eventsSELECT wait_event_type, wait_event, COUNT(*) as count, pg_blocking_pids(pid) as blocked_byFROM pg_stat_activityWHERE state = 'active'GROUP BY wait_event_type, wait_eventORDER BY count DESC;25.5 Cache and Buffer Analysis
Section titled “25.5 Cache and Buffer Analysis”-- Cache hit ratioSELECT 'Shared Buffers' as cache_type, ROUND(100.0 * sum(blks_hit) / NULLIF(sum(blks_hit + blks_read), 0), 2) as hit_ratio, sum(blks_hit) as hits, sum(blks_read) as readsFROM pg_stat_databaseUNION ALLSELECT 'Template/Plans' as cache_type, ROUND(100.0 * sum(hit) / NULLIF(sum(hit) + sum(miss), 0), 2), sum(hit), sum(miss)FROM pg_stat_statements;
-- Table I/O statisticsSELECT schemaname, relname, heap_blks_read, heap_blks_hit, ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) as hit_ratio, idx_blks_read, idx_blks_hit, ROUND(100.0 * idx_blks_hit / NULLIF(idx_blks_hit + idx_blks_read, 0), 2) as idx_hit_ratio, toast_blks_read, toast_blks_hitFROM pg_statio_user_tablesORDER BY heap_blks_read DESC;
-- Effective cache size recommendationSELECT pg_size_pretty(sum(heap_blks_read + idx_blks_read) * 8192 / 3) as recommended_cacheFROM pg_statio_user_tables;25.6 Performance Monitoring Queries
Section titled “25.6 Performance Monitoring Queries”-- Connection pool statusSELECT state, COUNT(*) as count, ARRAY_AGG(application_name) as appsFROM pg_stat_activityGROUP BY state;
-- Database sizeSELECT datname, pg_size_pretty(pg_database_size(datname)) as sizeFROM pg_databaseORDER BY pg_database_size(datname) DESC;
-- Table sizesSELECT schemaname, relname, pg_size_pretty(pg_total_relation_size(relid)) as total_size, pg_size_pretty(pg_relation_size(relid)) as table_size, pg_size_pretty(pg_indexes_size(relid)) as indexes_sizeFROM pg_user_tablesORDER BY pg_total_relation_size(relid) DESCLIMIT 20;
-- Index bloatSELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) as index_size, ROUND(100.0 * (pg_relation_size(indexrelid) - pg_get_indexdef(indexrelid)::bytea) / pg_relation_size(indexrelid), 2) as bloat_pctFROM pg_stat_user_indexesORDER BY bloat_pct DESC;25.7 Setting Up Continuous Monitoring
Section titled “25.7 Setting Up Continuous Monitoring”-- Create monitoring functionCREATE OR REPLACE FUNCTION get_performance_stats()RETURNS TABLE ( metric_name TEXT, metric_value NUMERIC) AS $$BEGIN RETURN QUERY SELECT 'connections'::TEXT, COUNT(*)::NUMERIC FROM pg_stat_activity WHERE state = 'active' UNION ALL SELECT 'idle_connections'::TEXT, COUNT(*)::NUMERIC FROM pg_stat_activity WHERE state = 'idle' UNION ALL SELECT 'database_size_mb'::TEXT, pg_database_size(current_database()) / 1024 / 1024::NUMERIC UNION ALL SELECT 'cache_hit_ratio'::TEXT, (SELECT ROUND(100.0 * sum(blks_hit) / NULLIF(sum(blks_hit + blks_read), 0), 2) FROM pg_stat_database WHERE datname = current_database());END;$$ LANGUAGE plpgsql;
-- Use itSELECT * FROM get_performance_stats();
-- Create a query history table for slow queriesCREATE TABLE query_log ( id SERIAL PRIMARY KEY, logged_at TIMESTAMP DEFAULT NOW(), query_text TEXT, execution_time_ms NUMERIC, planner_time_ms NUMERIC);
-- Function to log slow queriesCREATE OR REPLACE FUNCTION log_slow_queries()RETURNS void AS $$BEGIN INSERT INTO query_log (query_text, execution_time_ms, planner_time_ms) SELECT query, (total_exec_time / calls)::NUMERIC, 0 FROM pg_stat_statements WHERE (total_exec_time / calls) > 100 -- > 100ms average ON CONFLICT DO NOTHING;END;$$ LANGUAGE plpgsql;25.8 Creating a Monitoring Dashboard
Section titled “25.8 Creating a Monitoring Dashboard”-- Comprehensive health check viewCREATE VIEW db_health_check ASSELECT (SELECT COUNT(*) FROM pg_stat_activity) as total_connections, (SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active') as active_connections, (SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'idle') as idle_connections, (SELECT ROUND(100.0 * sum(blks_hit) / NULLIF(sum(blks_hit + blks_read), 0), 2) FROM pg_stat_database WHERE datname = current_database()) as cache_hit_ratio, (SELECT count(*) FROM pg_locks WHERE NOT granted) as waiting_locks, (SELECT pg_database_size(current_database()) / 1024 / 1024) as db_size_mb, (SELECT MAX(age(now(), xact_start)) FROM pg_stat_activity WHERE state != 'idle') as oldest_transaction, (SELECT count(*) FROM pg_stat_activity WHERE state != 'idle' AND query_start < now() - interval '5 minutes') as long_running_queries;
-- Query the health checkSELECT * FROM db_health_check;Summary
Section titled “Summary”| View | Purpose |
|---|---|
| pg_stat_activity | Current connections/queries |
| pg_stat_statements | Query performance (needs extension) |
| pg_statio_* | I/O statistics |
| pg_locks | Lock monitoring |