Architecture
Chapter 3: PostgreSQL Architecture
Section titled βChapter 3: PostgreSQL ArchitectureβUnderstanding the Internal Working of PostgreSQL
Section titled βUnderstanding the Internal Working of PostgreSQLβ3.1 Overview of PostgreSQL Architecture
Section titled β3.1 Overview of PostgreSQL ArchitectureβPostgreSQL follows a client-server architecture where multiple processes work together to handle database operations. Understanding this architecture is crucial for effective database administration and performance tuning.
PostgreSQL Architecture Overview ========================================================================
ββββββββββββββββββββ β Client β β Applications β ββββββββββ¬ββββββββββ β Network β (TCP/IP) βΌ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β PostgreSQL Server β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Postmaster (Parent Process) β β β β Port: 5432 β β β β β β β β β’ Listens for incoming connections β β β β β’ Forks backend processes for each connection β β β β β’ Manages shared memory and semaphores β β β β β’ Handles crash recovery β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β Backend β β Backend β β Backend β β β β Process 1 β β Process 2 β β Process N β β β β β β β β β β β β - Handles β β - Handles β β - Handles β β β β queries β β queries β β queries β β β β - Executes β β - Executes β β - Executes β β β β SQL β β SQL β β SQL β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Background Processes β β β β β β β β ββββββββββββββ ββββββββββββββ ββββββββββββββ βββββββββββββββ β β β β Logger β β Auto β β WAL β β Checkpoint ββ β β β β β β vacuum β β Writer β β ββ β β β ββββββββββββββ ββββββββββββββ ββββββββββββββ βββββββββββββββ β β β β β β β ββββββββββββββ ββββββββββββββ ββββββββββββββ βββββββββββββββ β β β β Backgroundβ β Stats β β Archiver β β Replicationββ β β β β Writer β β Collector β β β β Launcher ββ β β β ββββββββββββββ ββββββββββββββ ββββββββββββββ βββββββββββββββ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Shared Memory β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β β Shared β β WAL β β Clog β β Lock β β β β Buffers β β Buffers β β Buffer β β Manager β β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Disk Storage β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β β Data β β WAL β β Logs β β Stats β β β β Files β β Files β β β β Files β β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.2 Postmaster Process
Section titled β3.2 Postmaster ProcessβThe postmaster is the primary process that manages the PostgreSQL database cluster. It acts as a coordinator for all other processes.
Postmaster Process Role ========================================================================
Postmaster Responsibilities: ============================
1. Initialization βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β - Initializes shared memory structures β β - Starts background processes β β - Sets up signal handlers β β - Listens on port 5432 (default) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Connection Management βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β - Listens for incoming client connections β β - Authenticates clients (via pg_hba.conf) β β - Forks backend processes for each connection β β - Manages connection limits β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. Process Management βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β - Monitors backend processes β β - Handles abnormal terminations β β - Performs crash recovery β β - Coordinates checkpoint operations β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4. System Lifecycle βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β pg_ctl start β Start postmaster β β pg_ctl stop β Graceful shutdown β β pg_ctl restart β Stop and start postmaster β β pg_ctl reload β Signal postmaster to reload config β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.3 Backend Process
Section titled β3.3 Backend ProcessβEach client connection is handled by a separate backend process. This architecture provides isolation between connections.
Backend Process Architecture ========================================================================
Connection Flow: =================
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β Client β β Postmaster β β Backend β β Application βββββββΆβ βββββββΆβ Process β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β β β β forks β β β βΌ βΌ ββββββββββββββββ ββββββββββββββββ β listens β β executes β β for new β β queries β β connections β ββββββββββββββββ ββββββββββββββββ
Backend Process Lifecycle: ==========================
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β START β β β β β βΌ β β ββββββββββββββββββββββββ β β β Authentication β β Handshake with client β β β Phase β β β ββββββββββββ¬ββββββββββββ β β β β β βΌ β β ββββββββββββββββββββββββ β β β Session β β Initialize session state β β β Initialization β - Set search_path β β β β - Load extensions β β ββββββββββββ¬ββββββββββββ - Initialize temp tables β β β β β βΌ β β ββββββββββββββββββββββββ β β β Query Processing β β Main loop: β β β Phase β - Receive query β β β β - Parse & execute β β β β - Send results β β ββββββββββββ¬ββββββββββββ β β β β β βΌ β β ββββββββββββββββββββββββ β β β Disconnection β β Client disconnects or β β β Phase β connection timeout β β ββββββββββββ¬ββββββββββββ β β β β β βΌ β β END β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.4 Query Processing Flow
Section titled β3.4 Query Processing FlowβUnderstanding how PostgreSQL processes queries is essential for optimization and debugging.
Query Processing Pipeline ========================================================================
SQL Query Processing Stages: =============================
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β SQL βββββΆβ Parser βββββΆβ Optimizer βββββΆβ Executor β β Input β β β β β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β βΌ βΌ βΌ βββββββββββββββ βββββββββββββββ βββββββββββββββ β Parse Tree β β Plan Tree β β Results β β β β β β β β - SQL syntaxβ β - Query β β - Row data β β - Semantic β β rewrite β β - Error β β analysis β β - Plan β β messages β β β β selection β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ
Detailed Query Processing: ===========================
1. PARSER βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Input: SELECT * FROM users WHERE id = 1; β β β β Parser creates Parse Tree: β β β β βββββββββββββββββββββββββββββββββββββββββββββββ β β β SELECTStmt β β β β βββ targetList: * β β β β βββ fromClause: users β β β β βββ whereClause: id = 1 β β β βββββββββββββββββββββββββββββββββββββββββββββββ β β β β Checks: SQL syntax, table existence, column existence β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. TRANSFORMER βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Transforms Parse Tree into Query Tree: β β β β βββββββββββββββββββββββββββββββββββββββββββββββ β β β Query β β β β βββ resultRelation: users β β β β βββ jointree: users β β β β βββ targetList: id, name, email β β β β βββ qual: id = 1 β β β βββββββββββββββββββββββββββββββββββββββββββββββ β β β β Adds type information, resolves column types β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. REWRITER βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Applies rules (views, rules): β β β β CREATE VIEW user_summary AS β β SELECT id, name FROM users; β β β β SELECT * FROM user_summary; β β β β Rewritten to: β β SELECT id, name FROM users; β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4. OPTIMIZER (Planner) βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Creates optimal execution plan: β β β β Consider available indexes: β β - users_pkey (B-tree on id) β β β β Possible plans: β β 1. Seq Scan on users, Filter: id = 1 β β 2. Index Scan using users_pkey, WHERE: id = 1 β β β β Selects: Index Scan (cheaper) β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5. EXECUTOR βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Executes the plan: β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Executor Steps (Index Scan on users_pkey): β β β β β β β β 1. Initialize index scan β β β β 2. Search index for id = 1 β β β β 3. Fetch heap tuple for each index entry β β β β 4. Apply visibility check (MVCC) β β β β 5. Return matching tuples β β β β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β Returns: (1, 'John', 'john@example.com') β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.5 Background Processes
Section titled β3.5 Background ProcessesβPostgreSQL uses multiple background processes to handle various system tasks efficiently.
Background Processes Overview ========================================================================
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Background Processes β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 1. Logger Process (logger) β β β β β β β β β’ Captures stderr output from other processes β β β β β’ Writes to PostgreSQL log files β β β β β’ Manages log rotation β β β β β’ Controlled by logging_collector setting β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 2. Autovacuum Launcher β β β β β β β β β’ Automatically runs VACUUM on bloated tables β β β β β’ Runs ANALYZE to update statistics β β β β β’ Prevents transaction ID wraparound β β β β β’ Configurable via autovacuum_* parameters β β β β β β β β ββββββββββββββββββββββββββββββββββββββββββββ β β β β β Autovacuum Process β β β β β β β β β β β β Table: users β β β β β β Dead tuples: 15,000 β β β β β β Threshold: 10,000 β β β β β β β β β β β β Action: VACUUM users; β β β β β β Action: ANALYZE users; β β β β β β β β β β β ββββββββββββββββββββββββββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 3. WAL Writer β β β β β β β β β’ Writes WAL records to disk periodically β β β β β’ Ensures durability of transactions β β β β β’ Balances between performance and durability β β β β β’ Controlled by wal_writer_delay setting β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 4. Checkpointer β β β β β β β β β’ Performs checkpoints at regular intervals β β β β β’ Writes all dirty buffers to disk β β β β β’ Updates pg_control file β β β β β’ Controlled by checkpoint_* parameters β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 5. Background Writer (bgwriter) β β β β β β β β β’ Periodically writes dirty shared buffers β β β β β’ Reduces checkpoint load β β β β β’ Controlled by bgwriter_* parameters β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 6. Statistics Collector (stats collector) β β β β β β β β β’ Collects statistics about database activity β β β β β’ Tracks table access, index usage, row counts β β β β β’ Used by pg_stat_* views β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 7. WAL Archiver (archiver) β β β β β β β β β’ Archives completed WAL segments β β β β β’ Used for point-in-time recovery β β β β β’ Controlled by archive_* parameters β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 8. Replication Launcher β β β β β β β β β’ Starts WAL sender processes for streaming replication β β β β β’ Manages replication slots β β β β β’ Controlled by max_wal_senders parameter β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.6 Memory Architecture
Section titled β3.6 Memory ArchitectureβPostgreSQL uses both shared memory (accessible by all processes) and local memory (per-process).
PostgreSQL Memory Architecture ========================================================================
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β SYSTEM MEMORY β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β SHARED MEMORY β β β β β β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β β β β β Shared β β WAL β β Clog β β β β β β Buffers β β Buffers β β (XLOG) β β β β β β β β β β β β β β β β Caches table β β WAL records β β Transaction β β β β β β and index data β β before write β β status bits β β β β β β β β β β β β β β β β shared_buffersβ β wal_buffers β β ~2-4 MB β β β β β β Default: 128MB β β Default: -1 β β β β β β β β β β (auto-tuned) β β β β β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β β β β β β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β β β β β Lock β β ValidF β β Free Space β β β β β β Manager β β (VM) β β Map β β β β β β β β β β β β β β β β Row-level β β Visibility β β Track free β β β β β β locks β β map β β space β β β β β β β β β β β β β β β β ~2-256MB β β ~1% of heap β β ~1% of heap β β β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β β β β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β LOCAL MEMORY (per backend process) β β β β β β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β β β β β Sort/Hash β β Hash Join β β Bitmap β β β β β β Memory β β Memory β β Memory β β β β β β β β β β β β β β β β work_mem: 4MB β β work_mem: 4MB β β work_mem: 4MB β β β β β β β β β β β β β β β β Used for: β β Used for: β β Used for: β β β β β β - ORDER BY β β - Hash Join β β - Bitmap Scan β β β β β β - DISTINCT β β - Hash Agg β β - Index Only β β β β β β - GROUP BY β β - Set Ops β β Scan β β β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β β β β β β β β ββββββββββββββββββ ββββββββββββββββββ β β β β β Maintenance β β Logical β β β β β β Memory β β Decoding β β β β β β β β Work Mem β β β β β β maintenance β β β β β β β β _work_mem: 64M β β ~10MB β β β β β β β β β β β β β β Used for: β β Used for: β β β β β β - VACUUM β β - Decode β β β β β β - CREATE INDEXβ β changes β β β β β β - ALTER TABLE β β for β β β β β ββββββββββββββββββ β replication β β β β β ββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.7 Storage Architecture
Section titled β3.7 Storage ArchitectureβPostgreSQL stores data in files on disk with a specific organization.
PostgreSQL Storage Architecture ========================================================================
Database Cluster Directory Structure: =====================================
$PGDATA/ βββ base/ -- Tablespace for default tablespace β βββ 1/ -- Database OID 1 (template0) β βββ 16384/ -- Database OID 16384 (postgres) β βββ ... β βββ global/ -- System catalogs (pg_database, etc.) β βββ pg_wal/ -- Write-Ahead Log (WAL) β βββ 000000010000000000000001 β βββ ... β βββ pg_xact/ -- Transaction commit log (Clog) β βββ 0000 β βββ ... β βββ pg_notify/ -- LISTEN/NOTIFY state β βββ pg_serial/ -- Serializable transaction information β βββ pg_snapshots/ -- Exported snapshots β βββ pg_subtrans/ -- Subtransaction status β βββ pg_twophase/ -- Two-phase commit state β βββ pg_dynshmem/ -- Dynamic shared memory β βββ pg_replslot/ -- Replication slots β βββ pg_tblspc/ -- Symbolic links to tablespaces β βββ pg_stat/ -- Statistics files β βββ pg_stat_tmp/ -- Statistics temporary files β βββ pg_log/ -- Log files (if logging_collector is on) β βββ postgresql.conf -- Configuration file
Per-Database Directory (base/16384/): ======================================
16384/ βββ 1255 -- Table: users (relfilenode) βββ 1255_vm -- Visibility map for users βββ 1255_fsm -- Free space map for users βββ 1255_init -- Initialization fork for users βββ 1259 -- Index: users_pkey βββ 1259_vm βββ 1259_fsm βββ 16385 -- Table: orders βββ 16386 -- Index: orders_pkey βββ ... -- More tables and indexes βββ PG_VERSION -- PostgreSQL version file
Page/File Structure: ====================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β 8KB Page Layout β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β PageHeaderData (24 bytes) β β β β βββ pd_lsn: WAL location of this page β β β β βββ pd_tli: Timeline ID β β β β βββ pd_flags: Page type flags β β β β βββ pd_lower: Offset to free space start β β β β βββ pd_upper: Offset to free space end β β β β βββ pd_special: Offset to special space β β β β βββ pd_pagesize_version: Page size and version β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β ItemIdData (each 4 bytes) β β β β βββ lp_off: Offset to tuple start β β β β βββ lp_len: Length of tuple β β β β βββ lp_flags: Status (used/dead/normal) β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Tuple Data (actual row data) β β β β β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β HeapTupleHeaderData β β β β β β βββ t_xmin: Transaction that inserted this tuple β β β β β β βββ t_xmax: Transaction that deleted/updated β β β β β β βββ t_cid: Command ID within transaction β β β β β β βββ t_ctid: Current tuple ID (for versioning) β β β β β β βββ t_hoff: Offset to data β β β β β β βββ t_bits: Null bitmap (if any columns are NULL) β β β β β β βββ t_oid: Object ID (for tables with OID) β β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β β β β β βΌ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β Column Values β β β β β β id (4 bytes) | name (variable) | created_at (8 bytes)β β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Special Space (for index-specific data) β β β β For B-tree: includes tree ordering information β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.8 Write-Ahead Log (WAL)
Section titled β3.8 Write-Ahead Log (WAL)βWAL is fundamental to PostgreSQLβs durability and recovery mechanisms.
Write-Ahead Log (WAL) System ========================================================================
WAL Purpose: ============
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Transaction Durability β β β β 1. When a transaction commits, changes are written to WAL first β β 2. WAL records are flushed to disk before returning to client β β 3. Data files are modified asynchronously β β 4. After crash, WAL is replayed to recover changes β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
WAL Write Flow: ===============
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β Transaction β β WAL Buffer β β WAL β β Commit βββββββΆβ (memory) βββββββΆβ Files β β β β β β (disk) β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β β β β βΌ β ββββββββββββββββ β β pg_wal/ β β β β β β 0000000100 β β β 0000000100 β β β 0000000100 β β ββββββββββββββββ β β Return "COMMIT" to client β (after WAL flush) βΌ Success!
Checkpoint Process: ===================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Checkpoint β β β β Every checkpoint (controlled by checkpoint_timeout): β β β β 1. Write all dirty buffers to data files β β 2. Write checkpoint record to WAL β β 3. Update pg_control file β β β β pg_control contains: β β - Latest checkpoint location β β - Database cluster state β β - WAL segment location β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Recovery Process: =================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Crash Recovery Flow β β β β Start: PostgreSQL detects unclean shutdown β β β β β βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 1. Read pg_control file β β β β - Find location of last checkpoint β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 2. Scan WAL from checkpoint β β β β - Redo all transactions marked as committed β β β β - Undo all transactions marked as aborted β β β β β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β WAL Record Types: β β β β β β - XLOG::Transaction::COMMIT β β β β β β - XLOG::Transaction::ABORT β β β β β β - XLOG::Heap::INSERT β β β β β β - XLOG::Heap::UPDATE β β β β β β - XLOG::Heap::DELETE β β β β β β - XLOG::Btree::INSERT β β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 3. Open database for connections β β β β - Database is now consistent β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β βΌ β β Ready! β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.9 MVCC Implementation
Section titled β3.9 MVCC ImplementationβPostgreSQL uses Multi-Version Concurrency Control to allow concurrent transactions without locking.
MVCC Implementation in PostgreSQL ========================================================================
How MVCC Works: ===============
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Tuple Version Chain β β β β Each tuple (row) has two transaction IDs: β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β t_xmin: Transaction that INSERTED this tuple β β β β t_xmax: Transaction that DELETED/UPDATED this tuple β β β β t_cid: Command ID within the transaction β β β β t_ctid: Pointer to newer version of this tuple β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β Example: Update a row β β β β Before: β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β users (id=1, name='Alice', t_xmin=100, t_xmax=0) β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β After UPDATE: β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β users (id=1, name='Alice', t_xmin=100, t_xmax=101) β β β β β β β β β β β t_ctid βββββββββββββββββββββΊβ β β β β β β β β β β users (id=1, name='Alice Smith', t_xmin=101, t_xmax=0) β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Visibility Check: =================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Transaction Visibility Rules β β β β A tuple is visible to transaction T if: β β β β 1. t_xmin is COMMITTED and NOT in T's snapshot β β AND β β 2. t_xmax is neither COMMITTED nor valid (not updated/deleted) β β OR t_xmax is in T's snapshot (T updated it itself) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Transaction Isolation Levels: =============================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Isolation Level β Behavior β ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€ β READ UNCOMMITTED (not β Can see uncommitted changes β β actually implemented) β (rarely used in PG) β ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€ β READ COMMITTED (default) β See only committed changes β β β from other transactions β ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€ β REPEATABLE READ β See consistent snapshot as of β β β transaction start β ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€ β SERIALIZABLE β Full MVCC isolation, ensures β β β equivalent to serial execution β ββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββ
Example: READ COMMITTED vs REPEATABLE READ ==========================================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Transaction A (READ COMMITTED) β β βββββββββββββββββββββββββββββββββββββ β β BEGIN; β β SELECT * FROM users WHERE id = 1; -- sees Alice β β βββββββββββββββββββββββββββ β β β Transaction B β β β β βββββββββββββββββββββββ β β β β UPDATE users β β β β SET name = 'Bob' β β β β WHERE id = 1; β β β β COMMIT; β β β βββββββββββββββββββββββββββ β β SELECT * FROM users WHERE id = 1; -- now sees Bob! β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Transaction A (REPEATABLE READ) β β βββββββββββββββββββββββββββββββββββββ β β BEGIN; β β SELECT * FROM users WHERE id = 1; -- sees Alice β β βββββββββββββββββββββββββββ β β β Transaction B β β β β βββββββββββββββββββββββ β β β β UPDATE users β β β β SET name = 'Bob' β β β β WHERE id = 1; β β β β COMMIT; β β β βββββββββββββββββββββββββββ β β SELECT * FROM users WHERE id = 1; -- still sees Alice! β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================3.10 Best Practices
Section titled β3.10 Best Practicesβ Architecture Best Practices ========================================================================
1. Memory Configuration ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β’ Set shared_buffers to 25% of available RAM β β β’ Adjust work_mem based on query complexity β β β’ Use maintenance_work_mem for VACUUM and INDEX β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. WAL Configuration ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β’ Use dedicated disk for WAL (SSD recommended) β β β’ Set appropriate checkpoint intervals β β β’ Enable WAL archiving for production β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. Process Management ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β’ Configure max_connections appropriately β β β’ Use connection pooler for high traffic β β β’ Monitor backend process count β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4. Monitoring ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β’ Enable pg_stat_statements extension β β β’ Monitor shared buffer hit ratio β β β’ Track query execution times β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ========================================================================Next Chapter
Section titled βNext ChapterβChapter 4: Database Objects & Concepts
Last Updated: February 2026