Skip to content

Amazon Aurora

Chapter 22: Amazon Aurora - Cloud-Native Database

Section titled “Chapter 22: Amazon Aurora - Cloud-Native Database”

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.

Aurora Overview
+------------------------------------------------------------------+
| |
| +------------------------+ |
| | Amazon Aurora | |
| +------------------------+ |
| | |
| +---------------------+---------------------+ |
| | | | |
| v v v |
| +----------+ +----------+ +----------+ |
| | MySQL | | Postgre | | Serverless| |
| | Compatible| | SQL Comp | | v2 | |
| | | | | | | |
| | 5x MySQL | | 3x | | Auto | |
| | perf | | PostgreSQL| | scaling | |
| +----------+ +----------+ +----------+ |
| |
| Key Features: |
| - 5x performance of MySQL |
| - 3x performance of PostgreSQL |
| - Up to 15 read replicas |
| - Automatic failover |
| - Storage auto-scaling |
| - Global Database (cross-region) |
| |
+------------------------------------------------------------------+

Aurora Storage Architecture
+------------------------------------------------------------------+
| |
| Aurora Cluster Volume |
| +----------------------------------------------------------+ |
| | | |
| | +----------------------------------------------------+ | |
| | | Storage Layer | | |
| | | (6 copies across 3 AZs) | | |
| | +----------------------------------------------------+ | |
| | | |
| | AZ 1 AZ 2 AZ 3 | |
| | +----------+ +----------+ +----------+ | |
| | | Copy 1 | | Copy 3 | | Copy 5 | | |
| | | Copy 2 | | Copy 4 | | Copy 6 | | |
| | +----------+ +----------+ +----------+ | |
| | | |
| | Features: | |
| | - 6 copies of data across 3 AZs | |
| | - Automatic replication | |
| | - Self-healing | |
| | - Storage auto-scaling (up to 128 TB) | |
| | - 99.99% availability | |
| | - 99.999999999% durability (11 9s) | |
| | | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Aurora Cluster Architecture
+------------------------------------------------------------------+
| |
| Aurora Cluster |
| +----------------------------------------------------------+ |
| | | |
| | Cluster Endpoint (Writer) | |
| | +----------------------------------------------------+ | |
| | | my-cluster.cluster-xxxx.region.rds.amazonaws.com | | |
| | +----------------------------------------------------+ | |
| | | | |
| | +-------------+-------------+ | |
| | | | | | |
| | v v v | |
| | +----------+ +----------+ +----------+ | |
| | | Writer | | Reader 1 | | Reader 2 | | |
| | | Instance | | Instance | | Instance | | |
| | | (Primary)| | (Replica)| | (Replica)| | |
| | +----------+ +----------+ +----------+ | |
| | | | | | |
| | +------+-------+------+-------+ | |
| | | | | |
| | v v | |
| | +----------------------------------------------------+ | |
| | | Cluster Volume | | |
| | | (Shared Storage) | | |
| | +----------------------------------------------------+ | |
| | | |
| | Reader Endpoint | |
| | +----------------------------------------------------+ | |
| | | my-cluster.cluster-ro-xxxx.region.rds.amazonaws.com| | |
| | +----------------------------------------------------+ | |
| | | |
| +----------------------------------------------------------+ |
| |
| Endpoints: |
| - Cluster: Points to writer (read/write) |
| - Reader: Load balances across readers (read-only) |
| - Instance: Direct connection to specific instance |
| - Custom: Custom endpoint for specific readers |
| |
+------------------------------------------------------------------+

Aurora Failover
+------------------------------------------------------------------+
| |
| Normal Operation |
| +----------------------------------------------------------+ |
| | | |
| | Writer (Primary) | |
| | +------------------+ | |
| | | AZ-a | | |
| | | +------------+ | | |
| | | | Writer | | | |
| | | | Instance | | | |
| | | +------------+ | | |
| | +------------------+ | |
| | | | |
| | v | |
| | Reader 1 Reader 2 | |
| | +----------+ +----------+ | |
| | | AZ-b | | AZ-c | | |
| | | +----+ | | +----+ | | |
| | | |Reader| | | |Reader| | | |
| | | +----+ | | +----+ | | |
| | +----------+ +----------+ | |
| | | |
| +----------------------------------------------------------+ |
| |
| Failover Scenario |
| +----------------------------------------------------------+ |
| | | |
| | Writer (FAILED) | |
| | +------------------+ | |
| | | AZ-a | | |
| | | +------------+ | | |
| | | | Writer | | <-- FAILURE | |
| | | | (DOWN) | | | |
| | | +------------+ | | |
| | +------------------+ | |
| | | | |
| | v | |
| | Reader 1 (Promoted) Reader 2 | |
| | +----------+ +----------+ | |
| | | AZ-b | | AZ-c | | |
| | | +----+ | | +----+ | | |
| | | |NEW | | | |Reader| | | |
| | | |WRITER| | | | | | | |
| | | +----+ | | +----+ | | |
| | +----------+ +----------+ | |
| | | |
| | Failover Time: Typically 20-30 seconds | |
| | DNS automatically updated | |
| | | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Aurora Auto-Scaling
+------------------------------------------------------------------+
| |
| Scaling Configuration |
| +----------------------------------------------------------+ |
| | | |
| | Aurora Replica Auto Scaling | |
| | +----------------------------------------------------+ | |
| | | | | |
| | | Policy: Target Tracking | | |
| | | - Metric: CPU Utilization | | |
| | | - Target: 70% | | |
| | | | | |
| | | Min Replicas: 1 | | |
| | | Max Replicas: 15 | | |
| | | | | |
| | +----------------------------------------------------+ | |
| | | |
| | Scaling Process: | |
| | 1. CPU > 70% threshold | |
| | 2. Add new replica | |
| | 3. Replica joins cluster | |
| | 4. Load balanced across readers | |
| | | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Aurora Serverless v2
+------------------------------------------------------------------+
| |
| Architecture |
| +----------------------------------------------------------+ |
| | | |
| | Capacity Range: 0.5 - 128 ACUs | |
| | (Aurora Capacity Units) | |
| | | |
| | Scaling: | |
| | - Instant scaling | |
| | - Per-second granularity | |
| | - Scale to zero (pause/resume) | |
| | | |
| | +----------------------------------------------------+ | |
| | | | | |
| | | Workload | | |
| | | | | | |
| | | v | | |
| | | +--+ +--+ +--+ +--+ +--+ +--+ | | |
| | | |ACU| |ACU| |ACU| |ACU| |ACU| |ACU| | | |
| | | | 1 | | 2 | | 4 | | 8 | | 16| | 32| | | |
| | | +--+ +--+ +--+ +--+ +--+ +--+ | | |
| | | ^ | | |
| | | | | | |
| | | Auto-scaling | | |
| | | | | |
| | +----------------------------------------------------+ | |
| | | |
| +----------------------------------------------------------+ |
| |
| Use Cases: |
| - Unpredictable workloads |
| - Development and testing |
| - Low-traffic applications |
| - Multi-tenant applications |
| |
+------------------------------------------------------------------+

Aurora Global Database
+------------------------------------------------------------------+
| |
| Global Architecture |
| +----------------------------------------------------------+ |
| | | |
| | Primary Region (US-East-1) | |
| | +----------------------------------------------+ | |
| | | | | |
| | | Writer | | |
| | | +----------+ | | |
| | | | Primary | | | |
| | | +----------+ | | |
| | | | | | |
| | | v | | |
| | | Cluster Volume | | |
| | | +------------------------------------------+| | |
| | | |||||||||||||||||||||||||||||||||||||||||||| | |
| | | +------------------------------------------+| | |
| | | | | |
| | +----------------------------------------------+ | |
| | | | |
| | | Replication | |
| | | (< 1 second) | |
| | v | |
| | Secondary Region (EU-West-1) | |
| | +----------------------------------------------+ | |
| | | | | |
| | | Readers | | |
| | | +----------+ +----------+ | | |
| | | | Reader 1 | | Reader 2 | | | |
| | | +----------+ +----------+ | | |
| | | | | | |
| | | v | | |
| | | Cluster Volume (Read-Only) | | |
| | | +------------------------------------------+| | |
| | | |||||||||||||||||||||||||||||||||||||||||||| | |
| | | +------------------------------------------+| | |
| | | | | |
| | +----------------------------------------------+ | |
| | | |
| +----------------------------------------------------------+ |
| |
| Features: |
| - Up to 5 secondary regions |
| - < 1 second replication lag |
| - Promote secondary to primary for DR |
| - Read from any region |
| - 99.99% availability across regions |
| |
+------------------------------------------------------------------+

Aurora Fast Clone
+------------------------------------------------------------------+
| |
| Traditional Clone vs Aurora Fast Clone |
| +----------------------------------------------------------+ |
| | | |
| | Traditional Clone: | |
| | - Copy all data | |
| | - Time consuming | |
| | - Additional storage cost | |
| | | |
| | Aurora Fast Clone: | |
| | - Copy-on-write protocol | |
| | - Instant clone | |
| | - No additional storage initially | |
| | - Only changed data stored separately | |
| | | |
| | Use Cases: | |
| | - Development environments | |
| | - Testing with production data | |
| | - Data analysis | |
| | | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Aurora Backtrack
+------------------------------------------------------------------+
| |
| Purpose: Undo changes without restoring from backup |
| |
| How it works: |
| +----------------------------------------------------------+ |
| | | |
| | Timeline: | |
| | |----|----|----|----|----|----|----|----| | |
| | ^ ^ ^ ^ ^ ^ ^ ^ ^ | |
| | | | | | | | | | | | |
| | Now -5m -10m -15m -20m -25m -30m -35m | |
| | | |
| | Backtrack to any point within window | |
| | - Default: 24 hours | |
| | - Maximum: 72 hours | |
| | | |
| | Benefits: | |
| | - No need to restore from snapshot | |
| | - Fast recovery from errors | |
| | - No new DB instance needed | |
| | | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

# ============================================================
# Aurora Cluster
# ============================================================
resource "aws_rds_cluster" "main" {
cluster_identifier = "main-aurora-cluster"
# Engine
engine = "aurora-mysql"
engine_version = "8.0.mysql_aurora.3.02.0"
engine_mode = "provisioned" # or "serverless"
# Database
database_name = "appdb"
master_username = "admin"
master_password = var.db_password
# Network
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.aurora.id]
# Encryption
storage_encrypted = true
kms_key_id = aws_kms_key.aurora.arn
# Availability
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
# Backup
backup_retention_period = 30
preferred_backup_window = "03:00-04:00"
# Backtrack
backtrack_window = 72 # hours
# Deletion protection
deletion_protection = true
skip_final_snapshot = false
final_snapshot_identifier = "main-aurora-final"
# Performance Insights
# Note: Enabled at instance level
tags = {
Name = "main-aurora-cluster"
}
}
# ============================================================
# Aurora Writer Instance
# ============================================================
resource "aws_rds_cluster_instance" "writer" {
identifier = "main-aurora-writer"
cluster_identifier = aws_rds_cluster.main.id
instance_class = "db.r6g.large"
engine = aws_rds_cluster.main.engine
engine_version = aws_rds_cluster.main.engine_version
# Performance Insights
performance_insights_enabled = true
performance_insights_kms_key_id = aws_kms_key.aurora.arn
# Monitoring
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
# Promotion tier (lower = higher priority for failover)
promotion_tier = 1
tags = {
Name = "main-aurora-writer"
}
}
# ============================================================
# Aurora Reader Instances
# ============================================================
resource "aws_rds_cluster_instance" "readers" {
count = 2
identifier = "main-aurora-reader-${count.index + 1}"
cluster_identifier = aws_rds_cluster.main.id
instance_class = "db.r6g.large"
engine = aws_rds_cluster.main.engine
engine_version = aws_rds_cluster.main.engine_version
# Performance Insights
performance_insights_enabled = true
# Promotion tier (higher = lower priority for failover)
promotion_tier = count.index + 2
tags = {
Name = "main-aurora-reader-${count.index + 1}"
}
}
# ============================================================
# Aurora Serverless v2
# ============================================================
resource "aws_rds_cluster" "serverless" {
cluster_identifier = "serverless-aurora-cluster"
engine = "aurora-mysql"
engine_mode = "provisioned"
engine_version = "8.0.mysql_aurora.3.02.0"
database_name = "appdb"
master_username = "admin"
master_password = var.db_password
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.aurora.id]
storage_encrypted = true
kms_key_id = aws_kms_key.aurora.arn
# Serverless v2 configuration
serverlessv2_scaling_configuration {
min_capacity = 0.5 # ACUs
max_capacity = 64 # ACUs
}
tags = {
Name = "serverless-aurora-cluster"
}
}
resource "aws_rds_cluster_instance" "serverless" {
identifier = "serverless-aurora-instance"
cluster_identifier = aws_rds_cluster.serverless.id
instance_class = "db.serverless"
engine = aws_rds_cluster.serverless.engine
engine_version = aws_rds_cluster.serverless.engine_version
tags = {
Name = "serverless-aurora-instance"
}
}
# ============================================================
# Aurora Global Cluster
# ============================================================
resource "aws_rds_global_cluster" "main" {
global_cluster_identifier = "main-global-cluster"
engine = "aurora-mysql"
engine_version = "8.0.mysql_aurora.3.02.0"
storage_encrypted = true
}
# Primary cluster
resource "aws_rds_cluster" "primary" {
global_cluster_identifier = aws_rds_global_cluster.main.id
cluster_identifier = "primary-cluster"
engine = aws_rds_global_cluster.main.engine
engine_version = aws_rds_global_cluster.main.engine_version
database_name = "appdb"
master_username = "admin"
master_password = var.db_password
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.aurora.id]
# Primary cluster must have at least 2 instances
# for Multi-AZ
}
# Secondary cluster (in different region)
resource "aws_rds_cluster" "secondary" {
provider = aws.dr_region
global_cluster_identifier = aws_rds_global_cluster.main.id
cluster_identifier = "secondary-cluster"
engine = aws_rds_global_cluster.main.engine
engine_version = aws_rds_global_cluster.main.engine_version
db_subnet_group_name = aws_db_subnet_group.dr.name
vpc_security_group_ids = [aws_security_group.aurora_dr.id]
# Secondary cluster is read-only
# No master_username/password needed
}
# ============================================================
# Aurora Cluster Parameter Group
# ============================================================
resource "aws_rds_cluster_parameter_group" "main" {
name = "main-aurora-params"
family = "aurora-mysql8.0"
description = "Aurora MySQL 8.0 parameter group"
parameter {
name = "time_zone"
value = "UTC"
}
parameter {
name = "character_set_server"
value = "utf8mb4"
}
parameter {
name = "aurora_enable_repl_log"
value = "1"
}
tags = {
Name = "main-aurora-params"
}
}
# ============================================================
# Auto Scaling for Read Replicas
# ============================================================
resource "aws_appautoscaling_target" "aurora" {
max_capacity = 15
min_capacity = 1
resource_id = "cluster:${aws_rds_cluster.main.cluster_identifier}"
scalable_dimension = "rds:cluster:ReadReplicaCount"
service_namespace = "rds"
}
resource "aws_appautoscaling_policy" "aurora" {
name = "aurora-read-replica-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.aurora.resource_id
scalable_dimension = aws_appautoscaling_target.aurora.scalable_dimension
service_namespace = aws_appautoscaling_target.aurora.service_namespace
target_tracking_scaling_policy_configuration {
target_value = 70
predefined_metric_specification {
predefined_metric_type = "RDSReaderAverageCPUUtilization"
}
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}

Aurora vs RDS Comparison
+------------------------------------------------------------------+
| |
| Feature | Aurora | RDS |
| -----------------|------------------|--------------------------|
| Storage | Auto-scaling | Provisioned |
| Max Storage | 128 TB | 64 TB |
| Read Replicas | Up to 15 | Up to 5 |
| Replication | Asynchronous | Async (RR), Sync (MAZ) |
| Failover | ~30 seconds | ~60-120 seconds |
| Multi-Master | Yes | No |
| Serverless | Yes | No |
| Global DB | Yes | Cross-Region RR |
| Backtrack | Yes | No |
| Fast Clone | Yes | No |
| -----------------|------------------|--------------------------|
| Cost | Higher | Lower |
| Use Case | High performance | General purpose |
| |
+------------------------------------------------------------------+

Aurora is the go-to for production relational workloads on AWS. SREs leverage its self-healing storage, fast failover, and auto-scaling replicas to build highly available systems. Key operational areas: failover testing, clone management for staging, global database for DR, and cost control between provisioned vs serverless.


Terminal window
# Install tools
sudo pacman -S aws-cli-v2 jq mysql postgresql
# === Aurora Cluster Status ===
#!/bin/bash
# ~/bin/aurora-status.sh
echo "=== Aurora Clusters ==="
aws rds describe-db-clusters \
--query 'DBClusters[*].{Cluster:DBClusterIdentifier,Engine:Engine,Status:Status,Writer:Endpoint,Readers:ReaderEndpoint,MultiAZ:MultiAZ}' \
--output table
echo ""
echo "=== Cluster Members ==="
aws rds describe-db-clusters \
--query 'DBClusters[*].DBClusterMembers[*].{Instance:DBInstanceIdentifier,IsWriter:IsClusterWriter,FailoverPriority:PromotionTier}' \
--output table
# === Fast Clone for Staging ===
aws rds restore-db-cluster-to-point-in-time \
--source-db-cluster-identifier prod-aurora \
--db-cluster-identifier staging-aurora \
--restore-type copy-on-write \
--use-latest-restorable-time
# === Trigger Manual Failover (chaos testing) ===
aws rds failover-db-cluster \
--db-cluster-identifier prod-aurora \
--target-db-instance-identifier prod-aurora-reader-1
# === Monitor Replication Lag ===
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name AuroraReplicaLag \
--dimensions Name=DBClusterIdentifier,Value=prod-aurora \
--start-time "$(date -d '1 hour ago' -u +%Y-%m-%dT%H:%M:%S)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%S)" \
--period 60 --statistics Average --output table

IssueCauseSolution
Writer failover took >30sApplication not using cluster endpointAlways use cluster endpoint, not instance endpoint
Reader endpoint returning stale dataReplica lagMonitor AuroraReplicaLag — normally <20ms, check for long transactions
Storage growing unexpectedlyBacktrack or deleted data not reclaimedAurora doesn’t shrink storage — re-create cluster from snapshot to reclaim
Serverless v2 scaling too slowMin ACU too lowIncrease min capacity for faster cold-start scaling
Global database promotion failedSecondary not in syncCheck replication lag <1s before promotion

  1. Q: How does Aurora’s storage differ from standard RDS?

    • A: Aurora uses a distributed, fault-tolerant storage layer: 6 copies across 3 AZs, self-healing (detects and repairs corruption), auto-scaling up to 128TB. Storage is separate from compute — all instances share a single cluster volume. This means: replicas have near-zero lag (read from shared storage), no replication overhead on writers, and instant failover (no data sync needed).
  2. Q: Aurora Global Database vs RDS cross-region read replica?

    • A: Aurora Global: <1s replication lag (physical replication at storage layer), up to 5 secondary regions, RPO <1s, RTO <1min with managed planned failover. RDS cross-region RR: minutes of lag (logical replication), manual promotion, higher RPO/RTO. Choose Aurora Global for mission-critical apps needing fast cross-region DR.

Exam Tip

  1. Storage: 6 copies across 3 AZs, auto-scaling up to 128 TB
  2. Read Replicas: Up to 15, asynchronous replication
  3. Failover: ~30 seconds, automatic promotion
  4. Endpoints: Cluster (writer), Reader (load balanced), Instance (direct)
  5. Serverless v2: 0.5-128 ACUs, instant scaling
  6. Global Database: Up to 5 secondary regions, < 1 second lag
  7. Fast Clone: Instant, copy-on-write, no extra storage
  8. Backtrack: Undo changes, up to 72 hours
  9. Multi-Master: Multiple writers (Aurora MySQL only)
  10. Performance: 5x MySQL, 3x PostgreSQL

Chapter 23: Amazon DynamoDB - NoSQL Database


Last Updated: March 2026