Skip to content

Auto Scaling & Elastic Load Balancing

Building Scalable and Highly Available Applications

Section titled “Building Scalable and Highly Available Applications”

Auto Scaling and Elastic Load Balancing work together to provide automatic scaling and high availability for your applications.

Auto Scaling & Load Balancing Architecture
+------------------------------------------------------------------+
| |
| Internet |
| | |
| v |
| +---------------+ |
| | Route 53 | |
| +---------------+ |
| | |
| v |
| +-----------------------------------+ |
| | Application Load Balancer | |
| | (ALB) | |
| +-----------------------------------+ |
| | | |
| v v |
| +---------------------+ +---------------------+ |
| | Auto Scaling | | Auto Scaling | |
| | Group 1 | | Group 2 | |
| | | | | |
| | +----+ +----+ +----+| | +----+ +----+ +----+| |
| | |EC2 | |EC2 | |EC2 || | |EC2 | |EC2 | |EC2 || |
| | +----+ +----+ +----+| | +----+ +----+ +----+| |
| | AZ-A AZ-B AZ-C | | AZ-A AZ-B AZ-C | |
| +---------------------+ +---------------------+ |
| |
| Components: |
| - Load Balancer: Distributes traffic |
| - Auto Scaling Group: Manages instance count |
| - Launch Template: Defines instance configuration |
| |
+------------------------------------------------------------------+

Load Balancer Types
+------------------------------------------------------------------+
| |
| Application Load Balancer (ALB) |
| +----------------------------------------------------------+ |
| | Layer: 7 (HTTP/HTTPS) | |
| | Features: | |
| | - Content-based routing | |
| | - Host-based routing | |
| | - Path-based routing | |
| | - WebSocket support | |
| | - HTTP/2 support | |
| | - TLS termination | |
| | Use Cases: | |
| | - Web applications | |
| | - Microservices | |
| | - Containerized applications | |
| +----------------------------------------------------------+ |
| |
| Network Load Balancer (NLB) |
| +----------------------------------------------------------+ |
| | Layer: 4 (TCP/UDP) | |
| | Features: | |
| | - Ultra-high performance | |
| | - Static IP address | |
| | - TLS passthrough | |
| | - UDP support | |
| | - Millions of requests/second | |
| | Use Cases: | |
| | - Real-time gaming | |
| | - IoT | |
| | - Non-HTTP workloads | |
| +----------------------------------------------------------+ |
| |
| Gateway Load Balancer (GWLB) |
| +----------------------------------------------------------+ |
| | Layer: 3 (IP) | |
| | Features: | |
| | - Transparent network gateway | |
| | - Third-party security appliances | |
| | - Inline traffic inspection | |
| | Use Cases: | |
| | - Firewalls | |
| | - IDS/IPS | |
| | - Deep packet inspection | |
| +----------------------------------------------------------+ |
| |
| Classic Load Balancer (CLB) - Legacy |
| +----------------------------------------------------------+ |
| | Layer: 4 & 7 | |
| | Status: Deprecated (use ALB/NLB instead) | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
ALB Request Routing
+------------------------------------------------------------------+
| |
| Client Request |
| | |
| v |
| +---------------+ |
| | Listener | |
| | (Port 443) | |
| +---------------+ |
| | |
| v |
| +---------------+ |
| | Rules | |
| | Evaluation | |
| +---------------+ |
| | |
| +------------------+------------------+ |
| | | | |
| v v v |
| +---------+ +---------+ +---------+ |
| | Rule 1 | | Rule 2 | | Default | |
| | | | | | Rule | |
| |Host: | |Path: | | | |
| |api. | |/images | | | |
| |example. | | | | | |
| |com | | | | | |
| +---------+ +---------+ +---------+ |
| | | | |
| v v v |
| +---------+ +---------+ +---------+ |
| |Target | |Target | |Target | |
| |Group 1 | |Group 2 | |Group 3 | |
| | | | | | | |
| |API | |Image | |Web | |
| |Servers | |Servers | |Servers | |
| +---------+ +---------+ +---------+ |
| |
+------------------------------------------------------------------+
Target Group Configuration
+------------------------------------------------------------------+
| |
| Target Group Settings: |
| +----------------------------------------------------------+ |
| | Protocol: HTTP/HTTPS/TCP | |
| | Port: Application port | |
| | Health Check: | |
| | - Path: /health | |
| | - Interval: 30 seconds | |
| | - Timeout: 5 seconds | |
| | - Healthy threshold: 3 | |
| | - Unhealthy threshold: 2 | |
| +----------------------------------------------------------+ |
| |
| Target Types: |
| +----------------------------------------------------------+ |
| | | |
| | Instance | IP Address | Lambda | |
| | +----------+ | +----------+ | +----------+ | |
| | | EC2 | | | Private | | | Function | | |
| | | Instance | | | IP | | | | | |
| | | ID | | | Address | | | | | |
| | +----------+ | +----------+ | +----------+ | |
| | | | | |
| | Use: EC2 in ASG | Use: Containers | Use: Serverless | |
| | | on ECS/EKS | | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Health Check Flow
+------------------------------------------------------------------+
| |
| Load Balancer |
| +----------------------------------------------------------+ |
| | | |
| | Health Check Request (every 30s) | |
| | | | |
| | v | |
| | +----------+ +----------+ +----------+ | |
| | | Target 1 | | Target 2 | | Target 3 | | |
| | | | | | | | | |
| | | GET | | GET | | GET | | |
| | | /health | | /health | | /health | | |
| | | | | | | | | |
| | | 200 OK | | 200 OK | | 503 | | |
| | | HEALTHY | | HEALTHY | | UNHEALTHY| | |
| | +----------+ +----------+ +----------+ | |
| | | |
| | Traffic Routing: | |
| | - Only routes to HEALTHY targets | |
| | - Unhealthy targets removed from rotation | |
| | - Auto Scaling can replace unhealthy instances | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Auto Scaling Group Components
+------------------------------------------------------------------+
| |
| +------------------------+ |
| | Auto Scaling Group | |
| +------------------------+ |
| | |
| +---------------------+---------------------+ |
| | | | |
| v v v |
| +----------+ +----------+ +----------+ |
| | Launch | | Scaling | | Health | |
| | Template | | Policies | | Checks | |
| +----------+ +----------+ +----------+ |
| |
| Launch Template: Instance configuration |
| Scaling Policies: When to scale |
| Health Checks: Instance health monitoring |
| |
+------------------------------------------------------------------+
Auto Scaling Policies
+------------------------------------------------------------------+
| |
| 1. Simple Scaling |
| +----------------------------------------------------------+ |
| | | |
| | CloudWatch Alarm | |
| | | | |
| | v | |
| | +----------+ +----------+ | |
| | | CPU > 80%| --> | Add 2 | | |
| | | | | Instances| | |
| | +----------+ +----------+ | |
| | | |
| | Cooldown Period: Wait before next scaling action | |
| +----------------------------------------------------------+ |
| |
| 2. Step Scaling |
| +----------------------------------------------------------+ |
| | | |
| | CPU Utilization Instances to Add | |
| | +----------------+-------------------+ | |
| | | 60-70% | +1 instance | | |
| | | 70-80% | +2 instances | | |
| | | 80-90% | +3 instances | | |
| | | > 90% | +4 instances | | |
| | +----------------+-------------------+ | |
| +----------------------------------------------------------+ |
| |
| 3. Target Tracking |
| +----------------------------------------------------------+ |
| | | |
| | Target: CPU = 50% | |
| | | |
| | Actual CPU Action | |
| | +----------------+-------------------+ | |
| | | 70% | Scale out | | |
| | | 40% | Scale in | | |
| | | 50% | No action | | |
| | +----------------+-------------------+ | |
| | | |
| | AWS automatically adjusts capacity | |
| +----------------------------------------------------------+ |
| |
| 4. Predictive Scaling |
| +----------------------------------------------------------+ |
| | | |
| | Uses ML to predict traffic patterns | |
| | | |
| | Time Predicted Traffic Instances | |
| | +----------------+-------------------+----------------+ | |
| | | 09:00 | High | 10 | | |
| | | 12:00 | Peak | 15 | | |
| | | 18:00 | Low | 5 | | |
| | +----------------+-------------------+----------------+ | |
| | | |
| | Pre-provisions capacity before traffic spikes | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Auto Scaling Process
+------------------------------------------------------------------+
| |
| Scale Out (Add Instances) |
| +----------------------------------------------------------+ |
| | | |
| | 1. CloudWatch Alarm triggers | |
| | | | |
| | v | |
| | 2. ASG evaluates scaling policy | |
| | | | |
| | v | |
| | 3. Launch new instance using Launch Template | |
| | | | |
| | v | |
| | 4. Instance boots and passes health checks | |
| | | | |
| | v | |
| | 5. Instance added to Load Balancer target group | |
| | | | |
| | v | |
| | 6. Traffic routed to new instance | |
| | | |
| +----------------------------------------------------------+ |
| |
| Scale In (Remove Instances) |
| +----------------------------------------------------------+ |
| | | |
| | 1. CloudWatch Alarm triggers (low utilization) | |
| | | | |
| | v | |
| | 2. ASG selects instance to terminate | |
| | | | |
| | v | |
| | 3. Instance enters Standby or Terminate | |
| | | | |
| | v | |
| | 4. Connection draining (if enabled) | |
| | | | |
| | v | |
| | 5. Instance removed from target group | |
| | | | |
| | v | |
| | 6. Instance terminated | |
| | | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Instance Protection Options
+------------------------------------------------------------------+
| |
| Scale-In Protection |
| +----------------------------------------------------------+ |
| | | |
| | Protected Instances (not terminated during scale-in) | |
| | | |
| | +----------+ +----------+ +----------+ | |
| | | Instance | | Instance | | Instance | | |
| | | 1 | | 2 | | 3 | | |
| | | [LOCK] | | | | [LOCK] | | |
| | |Protected | | Can be | |Protected | | |
| | +----------+ |terminated| +----------+ | |
| | +----------+ | |
| | | |
| | Enable: | |
| | aws autoscaling set-instance-protection \ | |
| | --instance-ids i-12345 \ | |
| | --protected-from-scale-in | |
| +----------------------------------------------------------+ |
| |
| Standby State |
| +----------------------------------------------------------+ |
| | | |
| | Instance in Standby: | |
| | - Not serving traffic | |
| | - Not replaced by ASG | |
| | - Can be updated/troubleshooted | |
| | | |
| | Enter Standby: | |
| | aws autoscaling enter-standby \ | |
| | --instance-ids i-12345 \ | |
| | --auto-scaling-group-name my-asg | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Complete ALB + ASG Setup
+------------------------------------------------------------------+
| |
| Internet |
| | |
| v |
| +---------------+ |
| | Route 53 | |
| | (DNS) | |
| +---------------+ |
| | |
| v |
| +-----------------------------------+ |
| | Application Load Balancer | |
| | | |
| | Listeners: | |
| | - Port 80 (HTTP) -> Redirect | |
| | - Port 443 (HTTPS) | |
| | | |
| | Target Groups: | |
| | - Web-TG (Port 8080) | |
| | - API-TG (Port 3000) | |
| +-----------------------------------+ |
| / \ |
| / \ |
| v v |
| +---------------------+ +---------------------+ |
| | Auto Scaling | | Auto Scaling | |
| | Group: Web | | Group: API | |
| | | | | |
| | Min: 2 | | Min: 2 | |
| | Max: 10 | | Max: 20 | |
| | Desired: 3 | | Desired: 5 | |
| | | | | |
| | +----+ +----+ +----+| | +----+ +----+ +----+ +----+ +----+| |
| | |Web | |Web | |Web || | |API | |API | |API | |API | |API || |
| | +----+ +----+ +----+| | +----+ +----+ +----+ +----+ +----+| |
| +---------------------+ +---------------------+ |
| |
+------------------------------------------------------------------+

# ============================================================
# Application Load Balancer
# ============================================================
resource "aws_lb" "main" {
name = "my-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.public_subnet_ids
enable_deletion_protection = false
}
resource "aws_lb_target_group" "web" {
name = "web-tg"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 3
interval = 30
matcher = "200"
path = "/health"
port = "traffic-port"
protocol = "HTTP"
timeout = 5
unhealthy_threshold = 2
}
}
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.main.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2021-06"
certificate_arn = var.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.web.arn
}
}
# ============================================================
# Auto Scaling Group
# ============================================================
resource "aws_launch_template" "web" {
name_prefix = "web-"
image_id = var.ami_id
instance_type = "t3.medium"
key_name = var.key_name
iam_instance_profile {
name = aws_iam_instance_profile.web.name
}
network_interfaces {
associate_public_ip_address = false
security_groups = [aws_security_group.web.id]
}
user_data = base64encode(<<-EOF
#!/bin/bash
yum install -y httpd
systemctl start httpd
EOF
)
tag_specifications {
resource_type = "instance"
tags = {
Name = "WebServer"
}
}
}
resource "aws_autoscaling_group" "web" {
name = "web-asg"
vpc_zone_identifier = var.private_subnet_ids
min_size = 2
max_size = 10
desired_capacity = 3
launch_template {
id = aws_launch_template.web.id
version = "$Latest"
}
target_group_arns = [aws_lb_target_group.web.arn]
health_check_type = "ELB"
health_check_grace_period = 300
tag {
key = "Name"
value = "WebServer"
propagate_at_launch = true
}
}
resource "aws_autoscaling_policy" "scale_out" {
name = "scale-out"
scaling_adjustment = 2
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = aws_autoscaling_group.web.name
}
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "80"
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.web.name
}
alarm_actions = [aws_autoscaling_policy.scale_out.arn]
}

Auto Scaling & Load Balancing Best Practices
+------------------------------------------------------------------+
| |
| 1. Multi-AZ Deployment |
| +----------------------------------------------------------+ |
| | - Deploy across minimum 2 AZs | |
| | - Use all available AZs for maximum availability | |
| | - Configure subnets in each AZ | |
| +----------------------------------------------------------+ |
| |
| 2. Health Check Configuration |
| +----------------------------------------------------------+ |
| | - Use meaningful health check endpoints | |
| | - Set appropriate thresholds | |
| | - Configure grace period for instance startup | |
| +----------------------------------------------------------+ |
| |
| 3. Scaling Policies |
| +----------------------------------------------------------+ |
| | - Use target tracking for simplicity | |
| | - Set appropriate cooldown periods | |
| | - Consider predictive scaling for known patterns | |
| +----------------------------------------------------------+ |
| |
| 4. Instance Warm-up |
| +----------------------------------------------------------+ |
| | - Allow time for instance initialization | |
| | - Use lifecycle hooks for custom initialization | |
| | - Configure appropriate grace period | |
| +----------------------------------------------------------+ |
| |
| 5. Monitoring |
| +----------------------------------------------------------+ |
| | - Monitor scaling activities | |
| | - Set up CloudWatch alarms | |
| | - Use ASG notifications | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Auto Scaling and Load Balancing are the foundation of self-healing infrastructure. They’re central to achieving high availability, handling traffic spikes, and enabling zero-downtime deployments.

ASG/ELB in DevOps Workflow
+------------------------------------------------------------------+
| |
| Core SRE Use Cases: |
| |
| 1. Zero-Downtime Deployments |
| +----------------------------------------------------------+ |
| | - Rolling updates via ASG instance refresh | |
| | - Blue/green deployments with weighted target groups | |
| | - Canary releases using ALB routing rules | |
| +----------------------------------------------------------+ |
| |
| 2. Self-Healing Infrastructure |
| +----------------------------------------------------------+ |
| | - ELB health checks detect failed instances | |
| | - ASG replaces unhealthy instances automatically | |
| | - No manual intervention required during failures | |
| +----------------------------------------------------------+ |
| |
| 3. Cost-Efficient Scaling |
| +----------------------------------------------------------+ |
| | - Scale to zero during off-hours (dev/staging) | |
| | - Mixed instance policies (Spot + On-Demand) | |
| | - Predictive scaling for known traffic patterns | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Terminal window
# Install monitoring tools
sudo pacman -S aws-cli-v2 jq
# ASG status dashboard script
#!/bin/bash
# ~/bin/asg-dashboard.sh
set -euo pipefail
echo "=== Auto Scaling Groups Status ==="
echo ""
for asg in $(aws autoscaling describe-auto-scaling-groups \
--query 'AutoScalingGroups[*].AutoScalingGroupName' \
--output text); do
echo "--- $asg ---"
aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names "$asg" \
--query 'AutoScalingGroups[0].{
Min:MinSize,
Max:MaxSize,
Desired:DesiredCapacity,
Instances:Instances[*].{
Id:InstanceId,
Health:HealthStatus,
Lifecycle:LifecycleState
}
}' --output yaml
echo ""
done
# Monitor ALB health in real-time
watch -n 10 'aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/12345 \
--query "TargetHealthDescriptions[*].[Target.Id,TargetHealth.State,TargetHealth.Description]" \
--output table'
# Trigger instance refresh (rolling deployment)
aws autoscaling start-instance-refresh \
--auto-scaling-group-name my-asg \
--preferences '{
"MinHealthyPercentage": 90,
"InstanceWarmup": 300
}' \
--desired-configuration '{
"LaunchTemplate": {
"LaunchTemplateId": "lt-12345",
"Version": "$Latest"
}
}'

IssueCauseSolution
Instances launching but immediately failingHealth check misconfiguredIncrease grace period, verify health check path
ASG not scaling outCloudWatch alarm not triggeringCheck alarm thresholds and evaluation periods
Scaling oscillationCooldown too shortIncrease cooldown period, use target tracking
5xx errors during deploymentNo connection drainingEnable deregistration delay on target group
Uneven traffic distributionCross-zone LB disabledEnable cross-zone load balancing
New instances failing health checkApp startup time too longIncrease health check grace period
Terminal window
# Debug scaling issues
# Check recent scaling activities
aws autoscaling describe-scaling-activities \
--auto-scaling-group-name my-asg \
--max-items 5 \
--query 'Activities[*].[StartTime,StatusCode,Description]' \
--output table
# Check ALB target health
aws elbv2 describe-target-health \
--target-group-arn arn:aws:....:targetgroup/my-tg/123 \
--query 'TargetHealthDescriptions[*].[Target.Id,TargetHealth.State,TargetHealth.Reason]' \
--output table

ASG/ELB Anti-Patterns
+------------------------------------------------------------------+
| |
| ❌ Mistake 1: Single-AZ Deployment |
| +----------------------------------------------------------+ |
| | Problem: All instances in one AZ | |
| | Impact: Total outage if AZ fails | |
| | Fix: Spread across minimum 2-3 AZs | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 2: Missing Health Check Grace Period |
| +----------------------------------------------------------+ |
| | Problem: ASG terminates instances during boot | |
| | Impact: Infinite launch/terminate loop | |
| | Fix: Set grace period > application startup time | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 3: No Connection Draining |
| +----------------------------------------------------------+ |
| | Problem: In-flight requests dropped during scale-in | |
| | Impact: User-facing errors, data loss | |
| | Fix: Enable deregistration delay (300s default) | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

  1. Q: Explain ALB vs NLB. When would you use each?

    • A: ALB operates at Layer 7 (HTTP/HTTPS), supports content-based routing, path routing, host routing. NLB operates at Layer 4 (TCP/UDP), offers ultra-low latency, static IP, and millions of RPS. Use ALB for web apps/APIs/microservices. Use NLB for real-time gaming, IoT, TCP services, or when you need a static IP.
  2. Q: What’s the difference between target tracking and step scaling?

    • A: Target tracking is simpler — you set a target metric (e.g., CPU 50%) and AWS scales automatically to maintain it. Step scaling gives granular control — you define different scaling amounts for different alarm thresholds. Target tracking is recommended for most cases; step scaling when you need different responses at different severity levels.
  1. Q: How would you implement zero-downtime deployments with ASG?
    • A: Use ASG Instance Refresh with MinHealthyPercentage=90 and InstanceWarmup=300. Update the launch template with new AMI/config, then trigger instance refresh. ASG replaces instances in batches, ensuring capacity never drops below 90%. Alternatively, use blue/green with two ASGs and weighted target groups for more control.

Exam Tip

  1. ALB vs NLB: ALB for HTTP/HTTPS (Layer 7), NLB for TCP/UDP (Layer 4)
  2. Target Groups: Can target instances, IPs, or Lambda functions
  3. Health Checks: ELB health checks + EC2 health checks for ASG
  4. Scaling Policies: Target tracking is simplest, step scaling for granular control
  5. Cooldown: Prevents rapid scaling cycles
  6. Instance Protection: Prevents scale-in termination
  7. Connection Draining: Allows in-flight requests to complete
  8. Cross-Zone Load Balancing: Distributes traffic evenly across AZs

Chapter 8: AWS Lambda - Serverless Computing


Last Updated: March 2026