Amazon Aurora
Chapter 22: Amazon Aurora - Cloud-Native Database
Section titled “Chapter 22: Amazon Aurora - Cloud-Native Database”High-Performance Relational Database
Section titled “High-Performance Relational Database”22.1 Overview
Section titled “22.1 Overview”Amazon Aurora is a MySQL and PostgreSQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.
Aurora Overview+------------------------------------------------------------------+| || +------------------------+ || | Amazon Aurora | || +------------------------+ || | || +---------------------+---------------------+ || | | | || v v v || +----------+ +----------+ +----------+ || | MySQL | | Postgre | | Serverless| || | Compatible| | SQL Comp | | v2 | || | | | | | | || | 5x MySQL | | 3x | | Auto | || | perf | | PostgreSQL| | scaling | || +----------+ +----------+ +----------+ || || Key Features: || - 5x performance of MySQL || - 3x performance of PostgreSQL || - Up to 15 read replicas || - Automatic failover || - Storage auto-scaling || - Global Database (cross-region) || |+------------------------------------------------------------------+22.2 Aurora Architecture
Section titled “22.2 Aurora Architecture”Storage Architecture
Section titled “Storage Architecture” Aurora Storage Architecture+------------------------------------------------------------------+| || Aurora Cluster Volume || +----------------------------------------------------------+ || | | || | +----------------------------------------------------+ | || | | Storage Layer | | || | | (6 copies across 3 AZs) | | || | +----------------------------------------------------+ | || | | || | AZ 1 AZ 2 AZ 3 | || | +----------+ +----------+ +----------+ | || | | Copy 1 | | Copy 3 | | Copy 5 | | || | | Copy 2 | | Copy 4 | | Copy 6 | | || | +----------+ +----------+ +----------+ | || | | || | Features: | || | - 6 copies of data across 3 AZs | || | - Automatic replication | || | - Self-healing | || | - Storage auto-scaling (up to 128 TB) | || | - 99.99% availability | || | - 99.999999999% durability (11 9s) | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Cluster Architecture
Section titled “Cluster Architecture” Aurora Cluster Architecture+------------------------------------------------------------------+| || Aurora Cluster || +----------------------------------------------------------+ || | | || | Cluster Endpoint (Writer) | || | +----------------------------------------------------+ | || | | my-cluster.cluster-xxxx.region.rds.amazonaws.com | | || | +----------------------------------------------------+ | || | | | || | +-------------+-------------+ | || | | | | | || | v v v | || | +----------+ +----------+ +----------+ | || | | Writer | | Reader 1 | | Reader 2 | | || | | Instance | | Instance | | Instance | | || | | (Primary)| | (Replica)| | (Replica)| | || | +----------+ +----------+ +----------+ | || | | | | | || | +------+-------+------+-------+ | || | | | | || | v v | || | +----------------------------------------------------+ | || | | Cluster Volume | | || | | (Shared Storage) | | || | +----------------------------------------------------+ | || | | || | Reader Endpoint | || | +----------------------------------------------------+ | || | | my-cluster.cluster-ro-xxxx.region.rds.amazonaws.com| | || | +----------------------------------------------------+ | || | | || +----------------------------------------------------------+ || || Endpoints: || - Cluster: Points to writer (read/write) || - Reader: Load balances across readers (read-only) || - Instance: Direct connection to specific instance || - Custom: Custom endpoint for specific readers || |+------------------------------------------------------------------+22.3 Aurora High Availability
Section titled “22.3 Aurora High Availability”Failover Process
Section titled “Failover Process” Aurora Failover+------------------------------------------------------------------+| || Normal Operation || +----------------------------------------------------------+ || | | || | Writer (Primary) | || | +------------------+ | || | | AZ-a | | || | | +------------+ | | || | | | Writer | | | || | | | Instance | | | || | | +------------+ | | || | +------------------+ | || | | | || | v | || | Reader 1 Reader 2 | || | +----------+ +----------+ | || | | AZ-b | | AZ-c | | || | | +----+ | | +----+ | | || | | |Reader| | | |Reader| | | || | | +----+ | | +----+ | | || | +----------+ +----------+ | || | | || +----------------------------------------------------------+ || || Failover Scenario || +----------------------------------------------------------+ || | | || | Writer (FAILED) | || | +------------------+ | || | | AZ-a | | || | | +------------+ | | || | | | Writer | | <-- FAILURE | || | | | (DOWN) | | | || | | +------------+ | | || | +------------------+ | || | | | || | v | || | Reader 1 (Promoted) Reader 2 | || | +----------+ +----------+ | || | | AZ-b | | AZ-c | | || | | +----+ | | +----+ | | || | | |NEW | | | |Reader| | | || | | |WRITER| | | | | | | || | | +----+ | | +----+ | | || | +----------+ +----------+ | || | | || | Failover Time: Typically 20-30 seconds | || | DNS automatically updated | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Read Replica Auto-Scaling
Section titled “Read Replica Auto-Scaling” Aurora Auto-Scaling+------------------------------------------------------------------+| || Scaling Configuration || +----------------------------------------------------------+ || | | || | Aurora Replica Auto Scaling | || | +----------------------------------------------------+ | || | | | | || | | Policy: Target Tracking | | || | | - Metric: CPU Utilization | | || | | - Target: 70% | | || | | | | || | | Min Replicas: 1 | | || | | Max Replicas: 15 | | || | | | | || | +----------------------------------------------------+ | || | | || | Scaling Process: | || | 1. CPU > 70% threshold | || | 2. Add new replica | || | 3. Replica joins cluster | || | 4. Load balanced across readers | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+22.4 Aurora Serverless
Section titled “22.4 Aurora Serverless”Aurora Serverless v2
Section titled “Aurora Serverless v2” Aurora Serverless v2+------------------------------------------------------------------+| || Architecture || +----------------------------------------------------------+ || | | || | Capacity Range: 0.5 - 128 ACUs | || | (Aurora Capacity Units) | || | | || | Scaling: | || | - Instant scaling | || | - Per-second granularity | || | - Scale to zero (pause/resume) | || | | || | +----------------------------------------------------+ | || | | | | || | | Workload | | || | | | | | || | | v | | || | | +--+ +--+ +--+ +--+ +--+ +--+ | | || | | |ACU| |ACU| |ACU| |ACU| |ACU| |ACU| | | || | | | 1 | | 2 | | 4 | | 8 | | 16| | 32| | | || | | +--+ +--+ +--+ +--+ +--+ +--+ | | || | | ^ | | || | | | | | || | | Auto-scaling | | || | | | | || | +----------------------------------------------------+ | || | | || +----------------------------------------------------------+ || || Use Cases: || - Unpredictable workloads || - Development and testing || - Low-traffic applications || - Multi-tenant applications || |+------------------------------------------------------------------+22.5 Aurora Global Database
Section titled “22.5 Aurora Global Database” Aurora Global Database+------------------------------------------------------------------+| || Global Architecture || +----------------------------------------------------------+ || | | || | Primary Region (US-East-1) | || | +----------------------------------------------+ | || | | | | || | | Writer | | || | | +----------+ | | || | | | Primary | | | || | | +----------+ | | || | | | | | || | | v | | || | | Cluster Volume | | || | | +------------------------------------------+| | || | | |||||||||||||||||||||||||||||||||||||||||||| | || | | +------------------------------------------+| | || | | | | || | +----------------------------------------------+ | || | | | || | | Replication | || | | (< 1 second) | || | v | || | Secondary Region (EU-West-1) | || | +----------------------------------------------+ | || | | | | || | | Readers | | || | | +----------+ +----------+ | | || | | | Reader 1 | | Reader 2 | | | || | | +----------+ +----------+ | | || | | | | | || | | v | | || | | Cluster Volume (Read-Only) | | || | | +------------------------------------------+| | || | | |||||||||||||||||||||||||||||||||||||||||||| | || | | +------------------------------------------+| | || | | | | || | +----------------------------------------------+ | || | | || +----------------------------------------------------------+ || || Features: || - Up to 5 secondary regions || - < 1 second replication lag || - Promote secondary to primary for DR || - Read from any region || - 99.99% availability across regions || |+------------------------------------------------------------------+22.6 Aurora Features
Section titled “22.6 Aurora Features”Fast Database Cloning
Section titled “Fast Database Cloning” Aurora Fast Clone+------------------------------------------------------------------+| || Traditional Clone vs Aurora Fast Clone || +----------------------------------------------------------+ || | | || | Traditional Clone: | || | - Copy all data | || | - Time consuming | || | - Additional storage cost | || | | || | Aurora Fast Clone: | || | - Copy-on-write protocol | || | - Instant clone | || | - No additional storage initially | || | - Only changed data stored separately | || | | || | Use Cases: | || | - Development environments | || | - Testing with production data | || | - Data analysis | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Backtrack
Section titled “Backtrack” Aurora Backtrack+------------------------------------------------------------------+| || Purpose: Undo changes without restoring from backup || || How it works: || +----------------------------------------------------------+ || | | || | Timeline: | || | |----|----|----|----|----|----|----|----| | || | ^ ^ ^ ^ ^ ^ ^ ^ ^ | || | | | | | | | | | | | || | Now -5m -10m -15m -20m -25m -30m -35m | || | | || | Backtrack to any point within window | || | - Default: 24 hours | || | - Maximum: 72 hours | || | | || | Benefits: | || | - No need to restore from snapshot | || | - Fast recovery from errors | || | - No new DB instance needed | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+22.7 Practical Configuration
Section titled “22.7 Practical Configuration”Aurora with Terraform
Section titled “Aurora with Terraform”# ============================================================# Aurora Cluster# ============================================================
resource "aws_rds_cluster" "main" { cluster_identifier = "main-aurora-cluster"
# Engine engine = "aurora-mysql" engine_version = "8.0.mysql_aurora.3.02.0" engine_mode = "provisioned" # or "serverless"
# Database database_name = "appdb" master_username = "admin" master_password = var.db_password
# Network db_subnet_group_name = aws_db_subnet_group.main.name vpc_security_group_ids = [aws_security_group.aurora.id]
# Encryption storage_encrypted = true kms_key_id = aws_kms_key.aurora.arn
# Availability availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
# Backup backup_retention_period = 30 preferred_backup_window = "03:00-04:00"
# Backtrack backtrack_window = 72 # hours
# Deletion protection deletion_protection = true skip_final_snapshot = false final_snapshot_identifier = "main-aurora-final"
# Performance Insights # Note: Enabled at instance level
tags = { Name = "main-aurora-cluster" }}
# ============================================================# Aurora Writer Instance# ============================================================
resource "aws_rds_cluster_instance" "writer" { identifier = "main-aurora-writer" cluster_identifier = aws_rds_cluster.main.id instance_class = "db.r6g.large" engine = aws_rds_cluster.main.engine engine_version = aws_rds_cluster.main.engine_version
# Performance Insights performance_insights_enabled = true performance_insights_kms_key_id = aws_kms_key.aurora.arn
# Monitoring monitoring_interval = 60 monitoring_role_arn = aws_iam_role.rds_monitoring.arn
# Promotion tier (lower = higher priority for failover) promotion_tier = 1
tags = { Name = "main-aurora-writer" }}
# ============================================================# Aurora Reader Instances# ============================================================
resource "aws_rds_cluster_instance" "readers" { count = 2
identifier = "main-aurora-reader-${count.index + 1}" cluster_identifier = aws_rds_cluster.main.id instance_class = "db.r6g.large" engine = aws_rds_cluster.main.engine engine_version = aws_rds_cluster.main.engine_version
# Performance Insights performance_insights_enabled = true
# Promotion tier (higher = lower priority for failover) promotion_tier = count.index + 2
tags = { Name = "main-aurora-reader-${count.index + 1}" }}
# ============================================================# Aurora Serverless v2# ============================================================
resource "aws_rds_cluster" "serverless" { cluster_identifier = "serverless-aurora-cluster"
engine = "aurora-mysql" engine_mode = "provisioned" engine_version = "8.0.mysql_aurora.3.02.0"
database_name = "appdb" master_username = "admin" master_password = var.db_password
db_subnet_group_name = aws_db_subnet_group.main.name vpc_security_group_ids = [aws_security_group.aurora.id]
storage_encrypted = true kms_key_id = aws_kms_key.aurora.arn
# Serverless v2 configuration serverlessv2_scaling_configuration { min_capacity = 0.5 # ACUs max_capacity = 64 # ACUs }
tags = { Name = "serverless-aurora-cluster" }}
resource "aws_rds_cluster_instance" "serverless" { identifier = "serverless-aurora-instance" cluster_identifier = aws_rds_cluster.serverless.id instance_class = "db.serverless" engine = aws_rds_cluster.serverless.engine engine_version = aws_rds_cluster.serverless.engine_version
tags = { Name = "serverless-aurora-instance" }}
# ============================================================# Aurora Global Cluster# ============================================================
resource "aws_rds_global_cluster" "main" { global_cluster_identifier = "main-global-cluster"
engine = "aurora-mysql" engine_version = "8.0.mysql_aurora.3.02.0"
storage_encrypted = true}
# Primary clusterresource "aws_rds_cluster" "primary" { global_cluster_identifier = aws_rds_global_cluster.main.id cluster_identifier = "primary-cluster"
engine = aws_rds_global_cluster.main.engine engine_version = aws_rds_global_cluster.main.engine_version
database_name = "appdb" master_username = "admin" master_password = var.db_password
db_subnet_group_name = aws_db_subnet_group.main.name vpc_security_group_ids = [aws_security_group.aurora.id]
# Primary cluster must have at least 2 instances # for Multi-AZ}
# Secondary cluster (in different region)resource "aws_rds_cluster" "secondary" { provider = aws.dr_region
global_cluster_identifier = aws_rds_global_cluster.main.id cluster_identifier = "secondary-cluster"
engine = aws_rds_global_cluster.main.engine engine_version = aws_rds_global_cluster.main.engine_version
db_subnet_group_name = aws_db_subnet_group.dr.name vpc_security_group_ids = [aws_security_group.aurora_dr.id]
# Secondary cluster is read-only # No master_username/password needed}
# ============================================================# Aurora Cluster Parameter Group# ============================================================
resource "aws_rds_cluster_parameter_group" "main" { name = "main-aurora-params" family = "aurora-mysql8.0" description = "Aurora MySQL 8.0 parameter group"
parameter { name = "time_zone" value = "UTC" }
parameter { name = "character_set_server" value = "utf8mb4" }
parameter { name = "aurora_enable_repl_log" value = "1" }
tags = { Name = "main-aurora-params" }}
# ============================================================# Auto Scaling for Read Replicas# ============================================================
resource "aws_appautoscaling_target" "aurora" { max_capacity = 15 min_capacity = 1 resource_id = "cluster:${aws_rds_cluster.main.cluster_identifier}" scalable_dimension = "rds:cluster:ReadReplicaCount" service_namespace = "rds"}
resource "aws_appautoscaling_policy" "aurora" { name = "aurora-read-replica-scaling" policy_type = "TargetTrackingScaling" resource_id = aws_appautoscaling_target.aurora.resource_id scalable_dimension = aws_appautoscaling_target.aurora.scalable_dimension service_namespace = aws_appautoscaling_target.aurora.service_namespace
target_tracking_scaling_policy_configuration { target_value = 70
predefined_metric_specification { predefined_metric_type = "RDSReaderAverageCPUUtilization" }
scale_in_cooldown = 300 scale_out_cooldown = 60 }}22.8 Aurora vs RDS
Section titled “22.8 Aurora vs RDS” Aurora vs RDS Comparison+------------------------------------------------------------------+| || Feature | Aurora | RDS || -----------------|------------------|--------------------------|| Storage | Auto-scaling | Provisioned || Max Storage | 128 TB | 64 TB || Read Replicas | Up to 15 | Up to 5 || Replication | Asynchronous | Async (RR), Sync (MAZ) || Failover | ~30 seconds | ~60-120 seconds || Multi-Master | Yes | No || Serverless | Yes | No || Global DB | Yes | Cross-Region RR || Backtrack | Yes | No || Fast Clone | Yes | No || -----------------|------------------|--------------------------|| Cost | Higher | Lower || Use Case | High performance | General purpose || |+------------------------------------------------------------------+22.9 Why This Matters in DevOps/SRE
Section titled “22.9 Why This Matters in DevOps/SRE”Aurora is the go-to for production relational workloads on AWS. SREs leverage its self-healing storage, fast failover, and auto-scaling replicas to build highly available systems. Key operational areas: failover testing, clone management for staging, global database for DR, and cost control between provisioned vs serverless.
22.10 Linux Systems Perspective
Section titled “22.10 Linux Systems Perspective”Aurora Operations from Arch Linux
Section titled “Aurora Operations from Arch Linux”# Install toolssudo pacman -S aws-cli-v2 jq mysql postgresql
# === Aurora Cluster Status ===#!/bin/bash# ~/bin/aurora-status.shecho "=== Aurora Clusters ==="aws rds describe-db-clusters \ --query 'DBClusters[*].{Cluster:DBClusterIdentifier,Engine:Engine,Status:Status,Writer:Endpoint,Readers:ReaderEndpoint,MultiAZ:MultiAZ}' \ --output table
echo ""echo "=== Cluster Members ==="aws rds describe-db-clusters \ --query 'DBClusters[*].DBClusterMembers[*].{Instance:DBInstanceIdentifier,IsWriter:IsClusterWriter,FailoverPriority:PromotionTier}' \ --output table
# === Fast Clone for Staging ===aws rds restore-db-cluster-to-point-in-time \ --source-db-cluster-identifier prod-aurora \ --db-cluster-identifier staging-aurora \ --restore-type copy-on-write \ --use-latest-restorable-time
# === Trigger Manual Failover (chaos testing) ===aws rds failover-db-cluster \ --db-cluster-identifier prod-aurora \ --target-db-instance-identifier prod-aurora-reader-1
# === Monitor Replication Lag ===aws cloudwatch get-metric-statistics \ --namespace AWS/RDS \ --metric-name AuroraReplicaLag \ --dimensions Name=DBClusterIdentifier,Value=prod-aurora \ --start-time "$(date -d '1 hour ago' -u +%Y-%m-%dT%H:%M:%S)" \ --end-time "$(date -u +%Y-%m-%dT%H:%M:%S)" \ --period 60 --statistics Average --output table22.11 Troubleshooting Guide
Section titled “22.11 Troubleshooting Guide”| Issue | Cause | Solution |
|---|---|---|
| Writer failover took >30s | Application not using cluster endpoint | Always use cluster endpoint, not instance endpoint |
| Reader endpoint returning stale data | Replica lag | Monitor AuroraReplicaLag — normally <20ms, check for long transactions |
| Storage growing unexpectedly | Backtrack or deleted data not reclaimed | Aurora doesn’t shrink storage — re-create cluster from snapshot to reclaim |
| Serverless v2 scaling too slow | Min ACU too low | Increase min capacity for faster cold-start scaling |
| Global database promotion failed | Secondary not in sync | Check replication lag <1s before promotion |
22.12 Interview Questions
Section titled “22.12 Interview Questions”-
Q: How does Aurora’s storage differ from standard RDS?
- A: Aurora uses a distributed, fault-tolerant storage layer: 6 copies across 3 AZs, self-healing (detects and repairs corruption), auto-scaling up to 128TB. Storage is separate from compute — all instances share a single cluster volume. This means: replicas have near-zero lag (read from shared storage), no replication overhead on writers, and instant failover (no data sync needed).
-
Q: Aurora Global Database vs RDS cross-region read replica?
- A: Aurora Global: <1s replication lag (physical replication at storage layer), up to 5 secondary regions, RPO <1s, RTO <1min with managed planned failover. RDS cross-region RR: minutes of lag (logical replication), manual promotion, higher RPO/RTO. Choose Aurora Global for mission-critical apps needing fast cross-region DR.
22.13 Exam Tips
Section titled “22.13 Exam Tips”- Storage: 6 copies across 3 AZs, auto-scaling up to 128 TB
- Read Replicas: Up to 15, asynchronous replication
- Failover: ~30 seconds, automatic promotion
- Endpoints: Cluster (writer), Reader (load balanced), Instance (direct)
- Serverless v2: 0.5-128 ACUs, instant scaling
- Global Database: Up to 5 secondary regions, < 1 second lag
- Fast Clone: Instant, copy-on-write, no extra storage
- Backtrack: Undo changes, up to 72 hours
- Multi-Master: Multiple writers (Aurora MySQL only)
- Performance: 5x MySQL, 3x PostgreSQL
Next Chapter
Section titled “Next Chapter”Chapter 23: Amazon DynamoDB - NoSQL Database
Last Updated: March 2026