Replication
Chapter 8: Database Replication
Section titled “Chapter 8: Database Replication”Ensuring High Availability and Scalability
Section titled “Ensuring High Availability and Scalability”8.1 Introduction to Replication
Section titled “8.1 Introduction to Replication”Database Replication is the process of copying data from one database to another to ensure redundancy, improve availability, and scale read operations.
Why Replication? ===============
Without Replication: +----------+ |Database | (Single point of failure) | DB1 | +----------+
With Replication: +----------+ +----------+ | Primary | --> | Replica | | DB | | DB | +----------+ +----------+
Benefits: - High availability - Fault tolerance - Read scalability - Geographic distribution - Backup capability8.2 Replication Types
Section titled “8.2 Replication Types”8.2.1 Primary-Replica Replication
Section titled “8.2.1 Primary-Replica Replication” Primary-Replica Architecture ============================
Application | +------------+------------+ | | | v v v Read/ Write Read Write (Primary) (Replicas) (Primary) | v +-------------+ | Primary | | DB | +-------------+ | | Replicate v +-------------+ +-------------+ +-------------+ | Replica | | Replica | | Replica | | 1 | | 2 | | 3 | +-------------+ +-------------+ +-------------+
Flow: 1. Writes go to Primary 2. Primary replicates to Replicas 3. Reads can be served by Replicas8.2.2 Master-Slave Replication
Section titled “8.2.2 Master-Slave Replication” Master-Slave (Same as Primary-Replica) ======================================
Master = Primary Slave = Replica
Terminology: - "Master" is being deprecated - "Primary" and "Replica" are preferred
Read/Write Splitting: ====================
Application | v +-------------+ | Load | | Balancer | +-------------+ | +----+----+ | | v v Write Read Node Nodes8.2.3 Multi-Primary (Multi-Master)
Section titled “8.2.3 Multi-Primary (Multi-Master)” Multi-Master Architecture =========================
+-------------+ +-------------+ | Master A |<--->| Master B | | (Primary) | | (Primary) | +-------------+ +-------------+ | | v v +-------------+ +-------------+ | Replica | | Replica | | 1 | | 2 | +-------------+ +-------------+
Write Flow: - Application can write to any Master - Masters replicate to each other - Masters replicate to Replicas
Advantages: - No single point of write failure - Lower latency for writes (geographic)
Challenges: - Conflict resolution - Complexity8.3 Replication Methods
Section titled “8.3 Replication Methods”8.3.1 Synchronous Replication
Section titled “8.3.1 Synchronous Replication” Synchronous Replication ======================
Application -> Primary -> Replica1 -> Replica2 -> Response | | | v v v Write to all, wait for ACK
Timeline: +--------+ +--------+ +--------+ +--------+ |Write to| |Wait for| |Wait for| |Return | |Primary | -> |Replica1| -> |Replica2| -> |to App | +--------+ +--------+ +--------+ +--------+ 10ms 20ms 30ms 40ms
Pros: - Strong consistency - No data loss
Cons: - High latency - If one replica down, write fails8.3.2 Asynchronous Replication
Section titled “8.3.2 Asynchronous Replication” Asynchronous Replication =======================
Application -> Primary -> Return to App -> Background Replica | | v v Write Replicate async immediately
Timeline: +--------+ +--------+ +--------+ |Write to| |Return | |Replica| |Primary | -> |to App | -> |writes | +--------+ +--------+ +--------+ 10ms 10ms 100ms
Pros: - Low latency - Continues if replicas down
Cons: - Eventual consistency - Potential data loss (if primary fails)8.3.3 Semi-Synchronous Replication
Section titled “8.3.3 Semi-Synchronous Replication” Semi-Synchronous Replication ===========================
Primary -> At least one replica confirms -> Return to App | v All other replicas
Compromise: - Waits for at least 1 replica (not all) - Better performance than sync - Better consistency than async8.4 Replication Topologies
Section titled “8.4 Replication Topologies”8.4.1 Single Primary
Section titled “8.4.1 Single Primary” Single Primary Topology ======================
Primary (Read/Write) | +-----------+-----------+ | | | v v v Replica1 Replica2 Replica3
Use Cases: - Standard read scaling - High availability - Geographic distribution8.4.2 Chain Replication
Section titled “8.4.2 Chain Replication” Chain Replication =================
Primary -> Replica1 -> Replica2 -> Replica3
Write flow: 1. Write to Primary 2. Primary to Replica1 3. Replica1 to Replica2 4. Replica2 to Replica3 5. Return to application
Pros: - Simple - Lower bandwidth per node8.4.3 All-to-All Replication
Section titled “8.4.3 All-to-All Replication” All-to-All Replication =====================
Node A <-> Node B <-> Node C <-> Node A
Each node replicates to all others
Pros: - Most resilient - Any node can be primary
Cons: - Complex - High bandwidth8.5 Conflict Resolution
Section titled “8.5 Conflict Resolution”Types of Conflicts
Section titled “Types of Conflicts” Write Conflicts =============
Node A writes: user.balance = 100 Node B writes: user.balance = 50
Conflict occurs! Which one wins?
Resolution Strategies: +------------------+--------------------------------+ | Strategy | Description | +------------------+--------------------------------+ | Last-Write-Wins | Timestamp-based (simple) | | Vector Clocks | Track causality | | CRDTs | Conflict-free data types | | Manual | Queue for resolution | | Merge | Automatic merge rules | +------------------+--------------------------------+Last Write Wins (LWW)
Section titled “Last Write Wins (LWW)” LWW Implementation ==================
Each write includes timestamp
Example: Node A: Write X=100 at T1 Node B: Write X=50 at T2
Resolution: X=50 (T2 > T1)
Issues: - Clock synchronization needed - May lose updates - Not suitable for all cases8.6 Failover
Section titled “8.6 Failover”Automatic Failover Process
Section titled “Automatic Failover Process” Failover Steps =============
1. Detect Failure +---------------+ | Primary fails | | Replica can't | | connect | +---------------+
2. Elect New Primary +---------------+ | Replicas vote | | Choose new | | primary | +---------------+
3. Promote +---------------+ | New primary | | promoted | | Writes enabled| +---------------+
4. Reconfigure +---------------+ | Old primary | | removed from | | pool | +---------------+
5. Heal +---------------+ | Old primary | | comes back | | as replica | +---------------+Failover Considerations
Section titled “Failover Considerations”| Consideration | Impact |
|---|---|
| Data Loss | Async replication may lose data |
| Downtime | Time to detect and promote |
| Split Brain | Two primaries active |
| Clients | Must redirect to new primary |
8.7 Read Replicas
Section titled “8.7 Read Replicas”Scaling Reads
Section titled “Scaling Reads” Read Replica Architecture ========================
Write Path: App -> Primary DB -> Primary Storage
Read Path: App -> Load Balancer -> Replica 1 -> Replica 2 -> Replica 3
Read Distribution: - Round robin - Least connections - Geographic affinityLag Monitoring
Section titled “Lag Monitoring” Replication Lag ==============
How to measure: +----------------------------------+ | SHOW SLAVE STATUS\G | | Seconds_Behind_Master: 5 | +----------------------------------+
Impact of Lag: +------------------+------------------------+ | Lag | Impact | +------------------+------------------------+ | < 1 second | Minimal | | 1-30 seconds | Noticeable | | > 30 seconds | Significant stale data| | > 5 minutes | Major issue | +------------------+------------------------+
Solutions: - Increase replicas - Use faster network - Reduce write volume - Read from primary when fresh8.8 Cloud Database Replication
Section titled “8.8 Cloud Database Replication”AWS RDS Multi-AZ
Section titled “AWS RDS Multi-AZ” AWS RDS Multi-AZ ================
+--------------------------------------------------+ | AWS Region | +--------------------------------------------------+
Availability Zone 1 Availability Zone 2 +-------------+ +-------------+ | Primary | | Standby | | DB | Sync | DB | | | Replic. | | +-------------+ +-------------+
Features: - Automatic failover - Synchronous replication - Single endpoint - Automatic backupsRead Replica Configuration
Section titled “Read Replica Configuration” AWS RDS Read Replicas ====================
Primary Region Read Replica Region +----------+ +----------+ | Primary | Cross-region | Replica | | DB | Async Replic. | DB | +----------+ +----------+
Use Cases: - Read scaling - Cross-region disaster recovery - Dev/Test environments - Analytics queries8.9 Best Practices
Section titled “8.9 Best Practices”Replication Design
Section titled “Replication Design”| Best Practice | Description |
|---|---|
| Monitor lag | Track replication delay |
| Test failover | Regular DR tests |
| Size replicas | Same as primary |
| Network | Low latency between nodes |
| Backups | Continue during replication |
| Security | Encrypt replication traffic |
Summary
Section titled “Summary”Key replication concepts:
- Choose replication type - Sync for consistency, async for performance
- Plan for failures - Automatic failover ensures availability
- Monitor lag - Track replica delay
- Handle conflicts - Choose resolution strategy
- Scale reads - Use read replicas effectively
- Consider multi-region - Global distribution