Amazon ElastiCache
Chapter 24: Amazon ElastiCache - In-Memory Caching
Section titled “Chapter 24: Amazon ElastiCache - In-Memory Caching”Managed Redis and Memcached
Section titled “Managed Redis and Memcached”24.1 Overview
Section titled “24.1 Overview”Amazon ElastiCache is a fully managed in-memory data store service, compatible with Redis and Memcached.
ElastiCache Overview+------------------------------------------------------------------+| || +------------------------+ || | ElastiCache | || +------------------------+ || | || +---------------------+---------------------+ || | | | || v v v || +----------+ +----------+ +----------+ || | Redis | | Memcached| | Managed | || | | | | | Service | || | - Multi-AZ| | - Simple | | | || | - Cluster| | - Scale | | - Setup | || | - Replication| | out | | - Patch | || | - Persistence| | - Cache | | - Monitor| || +----------+ +----------+ +----------+ || || Redis: Advanced features, persistence, replication || Memcached: Simple caching, multi-threaded || |+------------------------------------------------------------------+24.2 Redis vs Memcached
Section titled “24.2 Redis vs Memcached” Redis vs Memcached Comparison+------------------------------------------------------------------+| || Feature | Redis | Memcached || -----------------|--------------------|------------------------|| Data Structures | Strings, Lists, | Simple key-value || | | Sets, Hashes, etc. | || -----------------|--------------------|------------------------|| Persistence | Yes (AOF, RDB) | No || -----------------|--------------------|------------------------|| Replication | Yes (Primary/Replica)| No || -----------------|--------------------|------------------------|| Multi-AZ | Yes | No || -----------------|--------------------|------------------------|| Clustering | Yes (sharding) | Yes (distributed) || -----------------|--------------------|------------------------|| Transactions | Yes | No || -----------------|--------------------|------------------------|| Pub/Sub | Yes | No || -----------------|--------------------|------------------------|| Sorting | Yes | No || -----------------|--------------------|------------------------|| Threading | Single-threaded | Multi-threaded || -----------------|--------------------|------------------------|| Use Case | Complex data, | Simple caching, || | | leaderboards, | session store || | | session store | || |+------------------------------------------------------------------+24.3 ElastiCache Redis Architecture
Section titled “24.3 ElastiCache Redis Architecture”Cluster Mode Disabled
Section titled “Cluster Mode Disabled” Redis Cluster Mode Disabled+------------------------------------------------------------------+| || Single Node || +----------------------------------------------------------+ || | | || | +------------------+ | || | | Primary Node | | || | | (Read/Write) | | || | +------------------+ | || | | || +----------------------------------------------------------+ || || Primary-Replica (No Cluster) || +----------------------------------------------------------+ || | | || | AZ A AZ B | || | +------------------+ +------------------+ | || | | Primary Node | | Replica Node | | || | | (Read/Write) | | (Read Only) | | || | +------------------+ +------------------+ | || | | ^ | || | +----Replication-------+ | || | | || | Features: | || | - Single shard | || | - Up to 5 replicas | || | - Automatic failover | || | - Data size: Up to node capacity | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Cluster Mode Enabled
Section titled “Cluster Mode Enabled” Redis Cluster Mode Enabled+------------------------------------------------------------------+| || Sharded Cluster || +----------------------------------------------------------+ || | | || | Shard 1 Shard 2 Shard 3 | || | +----------------+ +----------------+ +--------+ | || | | Primary Replica| | Primary Replica| | Primary| | || | | Node 1 Node 1 | | Node 2 Node 2 | | Node 3 | | || | | (AZ-a) (AZ-b) | | (AZ-a) (AZ-b) | | (AZ-a) | | || | +----------------+ +----------------+ +--------+ | || | | | | | || | v v v | || | Slot 0-5460 Slot 5461-10922 Slot 10923-16383| | | || | Features: | || | - Up to 500 shards | || | - Data partitioned across shards | || | - 16384 hash slots | || | - Automatic failover per shard | || | - Scale out by adding shards | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+24.4 ElastiCache Memcached Architecture
Section titled “24.4 ElastiCache Memcached Architecture” Memcached Architecture+------------------------------------------------------------------+| || Memcached Cluster || +----------------------------------------------------------+ || | | || | Application | || | +------------------+ | || | | | | || | +------------------+ | || | | | || | v | || | Configuration Endpoint | || | +------------------+ | || | | mycache.cfg.cache| | || | | .amazonaws.com | | || | +------------------+ | || | | | || | +----+----+ | || | | | | | || | v v v | || | +----++----++----+ | || | |Node||Node||Node| | || | | 1 || 2 || 3 | | || | +----++----++----+ | || | | || | Features: | || | - No replication | || | - Each node independent | || | - Client-side sharding | || | - Auto-discovery via config endpoint | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+24.5 Caching Patterns
Section titled “24.5 Caching Patterns”Lazy Loading (Cache-Aside)
Section titled “Lazy Loading (Cache-Aside)” Lazy Loading Pattern+------------------------------------------------------------------+| || Read Flow: || +----------------------------------------------------------+ || | | || | Application | || | | | || | v | || | +----------+ | || | | Check | | || | | Cache | | || | +----------+ | || | | | || | +----+----+ | || | | | | || | v v | || | HIT MISS | || | | | | || | v v | || | Return +----------+ | || | Data | Query | | || | | Database| | || | +----------+ | || | | | || | v | || | +----------+ | || | | Write to | | || | | Cache | | || | +----------+ | || | | | || | v | || | Return Data | || | | || +----------------------------------------------------------+ || || Pros: Only requested data cached || Cons: Cache miss penalty, stale data possible || |+------------------------------------------------------------------+Write-Through
Section titled “Write-Through” Write-Through Pattern+------------------------------------------------------------------+| || Write Flow: || +----------------------------------------------------------+ || | | || | Application | || | | | || | v | || | +----------+ | || | | Write to | | || | | Cache | | || | +----------+ | || | | | || | v | || | +----------+ | || | | Write to | | || | | Database | | || | +----------+ | || | | | || | v | || | Return Success | || | | || +----------------------------------------------------------+ || || Pros: Cache always fresh || Cons: Write latency, wasted cache for unread data || |+------------------------------------------------------------------+Write-Behind (Write-Back)
Section titled “Write-Behind (Write-Back)” Write-Behind Pattern+------------------------------------------------------------------+| || Write Flow: || +----------------------------------------------------------+ || | | || | Application | || | | | || | v | || | +----------+ | || | | Write to | | || | | Cache | | || | +----------+ | || | | | || | v | || | Return Success (immediate) | || | | | || | v (async) | || | +----------+ | || | | Write to | | || | | Database | | || | | (async) | | || | +----------+ | || | | || +----------------------------------------------------------+ || || Pros: Fast writes, reduced database load || Cons: Data loss risk, complexity || |+------------------------------------------------------------------+24.6 Redis Features
Section titled “24.6 Redis Features”Data Structures
Section titled “Data Structures” Redis Data Structures+------------------------------------------------------------------+| || Strings || +----------------------------------------------------------+ || | SET key value | || | GET key | || | INCR counter | || | Use: Caching, counters, session data | || +----------------------------------------------------------+ || || Hashes || +----------------------------------------------------------+ || | HSET user:1 name "John" email "john@ex.com" | || | HGET user:1 name | || | HGETALL user:1 | || | Use: User profiles, product info | || +----------------------------------------------------------+ || || Lists || +----------------------------------------------------------+ || | LPUSH queue task1 | || | RPOP queue | || | LRANGE queue 0 -1 | || | Use: Message queues, activity feeds | || +----------------------------------------------------------+ || || Sets || +----------------------------------------------------------+ || | SADD tags "redis" "database" | || | SMEMBERS tags | || | SINTER set1 set2 | || | Use: Tags, unique items, social graphs | || +----------------------------------------------------------+ || || Sorted Sets || +----------------------------------------------------------+ || | ZADD leaderboard 100 "player1" | || | ZRANGE leaderboard 0 -1 WITHSCORES | || | ZREVRANK leaderboard "player1" | || | Use: Leaderboards, rankings, rate limiting | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Persistence Options
Section titled “Persistence Options” Redis Persistence+------------------------------------------------------------------+| || RDB (Redis Database) || +----------------------------------------------------------+ || | | || | Features: | || | - Point-in-time snapshots | || | - Compact file | || | - Faster recovery | || | | || | Configuration: | || | save 900 1 # Save after 900s if >= 1 changes | || | save 300 10 # Save after 300s if >= 10 changes | || | save 60 10000 # Save after 60s if >= 10000 changes | || | | || +----------------------------------------------------------+ || || AOF (Append Only File) || +----------------------------------------------------------+ || | | || | Features: | || | - Logs every write operation | || | - Higher durability | || | - Larger file size | || | | || | Configuration: | || | appendonly yes | || | appendfsync everysec # Sync every second | || | appendfsync always # Sync every write (slowest) | || | | || +----------------------------------------------------------+ || || Recommended: Enable both RDB and AOF || |+------------------------------------------------------------------+24.7 Practical Configuration
Section titled “24.7 Practical Configuration”ElastiCache with Terraform
Section titled “ElastiCache with Terraform”# ============================================================# ElastiCache Subnet Group# ============================================================
resource "aws_elasticache_subnet_group" "main" { name = "main-cache-subnet" subnet_ids = var.private_subnet_ids
tags = { Name = "main-cache-subnet-group" }}
# ============================================================# ElastiCache Parameter Group# ============================================================
resource "aws_elasticache_parameter_group" "redis" { name = "redis-params" family = "redis7"
parameter { name = "maxmemory-policy" value = "allkeys-lru" }
parameter { name = "timeout" value = "300" }
tags = { Name = "redis-parameter-group" }}
# ============================================================# Redis Cluster Mode Disabled# ============================================================
resource "aws_elasticache_replication_group" "redis" { replication_group_id = "main-redis" replication_group_description = "Main Redis cluster"
# Engine engine = "redis" engine_version = "7.0" parameter_group_name = aws_elasticache_parameter_group.redis.name
# Node type node_type = "cache.r6g.large"
# Cluster mode disabled num_cache_clusters = 2 # Primary + 1 replica
# Network subnet_group_name = aws_elasticache_subnet_group.main.name security_group_ids = [aws_security_group.redis.id]
# Availability multi_az_enabled = true automatic_failover_enabled = true
# Encryption at_rest_encryption_enabled = true transit_encryption_enabled = true auth_token = var.redis_password
# Snapshot snapshot_retention_limit = 7 snapshot_window = "03:00-05:00"
# Maintenance maintenance_window = "Mon:05:00-Mon:07:00"
tags = { Name = "main-redis" }}
# ============================================================# Redis Cluster Mode Enabled# ============================================================
resource "aws_elasticache_replication_group" "cluster" { replication_group_id = "cluster-redis" replication_group_description = "Redis cluster mode enabled"
engine = "redis" engine_version = "7.0" parameter_group_name = aws_elasticache_parameter_group.redis.name
node_type = "cache.r6g.large"
# Cluster mode enabled cluster_mode { replicas_per_node_group = 1 num_node_groups = 3 # 3 shards }
# Network subnet_group_name = aws_elasticache_subnet_group.main.name security_group_ids = [aws_security_group.redis.id]
# Availability automatic_failover_enabled = true multi_az_enabled = true
# Encryption at_rest_encryption_enabled = true transit_encryption_enabled = true auth_token = var.redis_password
tags = { Name = "cluster-redis" }}
# ============================================================# Memcached Cluster# ============================================================
resource "aws_elasticache_cluster" "memcached" { cluster_id = "main-memcached" engine = "memcached" engine_version = "1.6.22"
node_type = "cache.r6g.large" num_cache_nodes = 3
# Network subnet_group_name = aws_elasticache_subnet_group.main.name security_group_ids = [aws_security_group.memcached.id]
# Parameter group parameter_group_name = "default.memcached1.6"
# Maintenance maintenance_window = "Mon:05:00-Mon:07:00"
tags = { Name = "main-memcached" }}
# ============================================================# Security Groups# ============================================================
resource "aws_security_group" "redis" { name = "redis-sg" description = "Security group for Redis" vpc_id = var.vpc_id
ingress { description = "Redis from application" from_port = 6379 to_port = 6379 protocol = "tcp" security_groups = [aws_security_group.app.id] }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] }
tags = { Name = "redis-sg" }}
resource "aws_security_group" "memcached" { name = "memcached-sg" description = "Security group for Memcached" vpc_id = var.vpc_id
ingress { description = "Memcached from application" from_port = 11211 to_port = 11211 protocol = "tcp" security_groups = [aws_security_group.app.id] }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] }
tags = { Name = "memcached-sg" }}
# ============================================================# Global Datastore (Redis)# ============================================================
# Primary regionresource "aws_elasticache_replication_group" "global_primary" { provider = aws.primary
replication_group_id = "global-redis" replication_group_description = "Global Redis"
engine = "redis" engine_version = "7.0" node_type = "cache.r6g.large" num_cache_clusters = 2
subnet_group_name = aws_elasticache_subnet_group.main.name security_group_ids = [aws_security_group.redis.id]
automatic_failover_enabled = true
# Global datastore global_replication_group_id = aws_elasticache_global_replication_group.main.id}
# Secondary regionresource "aws_elasticache_replication_group" "global_secondary" { provider = aws.secondary
replication_group_id = "global-redis-secondary" replication_group_description = "Global Redis Secondary"
# Reference global replication group global_replication_group_id = aws_elasticache_global_replication_group.main.global_replication_group_id primary_cluster_id = aws_elasticache_replication_group.global_primary.primary_cluster_id
subnet_group_name = aws_elasticache_subnet_group.secondary.name security_group_ids = [aws_security_group.redis_secondary.id]}
resource "aws_elasticache_global_replication_group" "main" { global_replication_group_id_suffix = "global" primary_replication_group_id = aws_elasticache_replication_group.global_primary.id}24.8 Best Practices
Section titled “24.8 Best Practices” ElastiCache Best Practices+------------------------------------------------------------------+| || 1. Instance Selection || +----------------------------------------------------------+ || | - Use cache-optimized instances (r6g family) | || | - Consider network bandwidth | || | - Right-size based on data volume | || +----------------------------------------------------------+ || || 2. Eviction Policy || +----------------------------------------------------------+ || | - allkeys-lru: Evict least recently used | || | - volatile-lru: Evict LRU among keys with TTL | || | - allkeys-random: Evict random keys | || | - noeviction: Return error when memory full | || +----------------------------------------------------------+ || || 3. Connection Management || +----------------------------------------------------------+ || | - Use connection pooling | || | - Set appropriate timeouts | || | - Handle connection failures gracefully | || +----------------------------------------------------------+ || || 4. Security || +----------------------------------------------------------+ || | - Enable encryption in-transit and at-rest | || | - Use AUTH password | || | - Deploy in private subnets | || | - Use security groups | || +----------------------------------------------------------+ || || 5. Monitoring || +----------------------------------------------------------+ || | - Monitor CPU, memory, connections | || | - Set up CloudWatch alarms | || | - Monitor cache hit ratio | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+24.9 Why This Matters in DevOps/SRE
Section titled “24.9 Why This Matters in DevOps/SRE”Caching is the #1 performance optimization for most applications. SREs use ElastiCache to reduce database load, improve response times, and handle session management. Key operational concerns: eviction rate monitoring, cluster scaling, cache invalidation strategies, and Redis failover testing.
24.10 Linux Systems Perspective
Section titled “24.10 Linux Systems Perspective”ElastiCache Operations from Arch Linux
Section titled “ElastiCache Operations from Arch Linux”# Install toolssudo pacman -S aws-cli-v2 jq redis
# === Cluster Status Dashboard ===#!/bin/bash# ~/bin/cache-status.shecho "=== Redis Clusters ==="aws elasticache describe-replication-groups \ --query 'ReplicationGroups[*].{Name:ReplicationGroupId,Status:Status,Shards:NodeGroups|length(@),AutoFailover:AutomaticFailover}' \ --output table
echo ""echo "=== Node Health ==="aws elasticache describe-cache-clusters --show-cache-node-info \ --query 'CacheClusters[*].{Cluster:CacheClusterId,Engine:Engine,Status:CacheClusterStatus,NodeType:CacheNodeType,Nodes:NumCacheNodes}' \ --output table
# === Connect to Redis directly (from bastion/within VPC) ===redis-cli -h my-redis.xxxx.cache.amazonaws.com -p 6379 \ --tls --askpass
# Useful Redis commands for SREredis-cli INFO memory # Memory usageredis-cli INFO stats # Hit/miss ratioredis-cli INFO replication # Replication statusredis-cli DBSIZE # Number of keysredis-cli SLOWLOG GET 10 # Last 10 slow commandsredis-cli CLIENT LIST # Connected clients
# === Monitor cache hit ratio ===aws cloudwatch get-metric-statistics \ --namespace AWS/ElastiCache \ --metric-name CacheHitRate \ --dimensions Name=ReplicationGroupId,Value=main-redis \ --start-time "$(date -d '1 hour ago' -u +%Y-%m-%dT%H:%M:%S)" \ --end-time "$(date -u +%Y-%m-%dT%H:%M:%S)" \ --period 300 --statistics Average --output table24.11 Troubleshooting Guide
Section titled “24.11 Troubleshooting Guide”| Issue | Cause | Solution |
|---|---|---|
| High eviction rate | Maxmemory reached | Scale up node type or enable cluster mode for more shards |
| Low cache hit ratio | Wrong caching strategy or short TTL | Analyze access patterns, increase TTL, pre-warm cache |
| Connection refused | Security group or AUTH token | Verify SG allows port 6379/11211, check auth token |
| Replication lag (Redis) | Large write volume | Monitor ReplicationLag, consider scaling up node type |
| Failover causes app errors | App not handling reconnects | Implement retry logic, use cluster endpoints |
24.12 Interview Questions
Section titled “24.12 Interview Questions”-
Q: Lazy loading vs write-through — when to use each?
- A: Lazy loading: data is cached only on read miss — good when most data is rarely read (avoids caching unused data). Write-through: cache is updated on every write — good when you can’t tolerate stale reads. Best practice: combine both — write-through for critical data (user sessions), lazy loading with TTL for less critical data (product catalog). Add TTL to both strategies to prevent indefinite staleness.
-
Q: How do you handle a Redis failover with zero data loss?
- A: Enable Multi-AZ with automatic failover. Redis replicates asynchronously, so a small window of data loss is possible during failover (typically <1s of writes). To minimize: (1) Use Multi-AZ, (2) Monitor
ReplicationLag— should be near zero, (3) For critical data, also persist to a durable store (DynamoDB/RDS), (4) Use AOF withappendfsync everysecfor best durability/performance balance.
- A: Enable Multi-AZ with automatic failover. Redis replicates asynchronously, so a small window of data loss is possible during failover (typically <1s of writes). To minimize: (1) Use Multi-AZ, (2) Monitor
24.13 Exam Tips
Section titled “24.13 Exam Tips”- Redis vs Memcached: Redis has persistence, replication, data structures
- Cluster Mode Disabled: Single shard, up to 5 replicas
- Cluster Mode Enabled: Up to 500 shards, data partitioned
- Lazy Loading: Cache on read, stale data possible
- Write-Through: Write to cache and DB, always fresh
- Eviction Policies: LRU, volatile-LRU, noeviction
- Persistence: RDB (snapshots), AOF (append-only)
- Multi-AZ: Automatic failover for Redis
- Encryption: At-rest and in-transit for Redis
- Global Datastore: Cross-region replication for Redis
Next Chapter
Section titled “Next Chapter”Chapter 25: Other AWS Database Services
Last Updated: March 2026