Skip to content

AWS Global Infrastructure


AWS operates the most comprehensive global cloud infrastructure, enabling organizations to deploy applications closer to end users while maintaining high availability and fault tolerance.

AWS Global Infrastructure Map
================================================================================
NORTH AMERICA
|
+--------------------+--------------------+
| | |
US-East-1 US-West-1 US-West-2
(N. Virginia) (N. California) (Oregon)
| | |
v v v
+---------+ +---------+ +---------+
| 6 AZs | | 3 AZs | | 4 AZs |
+---------+ +---------+ +---------+
EUROPE
|
+------------------------+------------------------+
| | |
EU-West-1 EU-Central-1 EU-West-2
(Ireland) (Frankfurt) (London)
| | |
v v v
+---------+ +---------+ +---------+
| 3 AZs | | 3 AZs | | 3 AZs |
+---------+ +---------+ +---------+
ASIA PACIFIC
|
+------------------------+---+------------------------+
| | |
AP-Southeast-1 AP-Northeast-1 AP-South-1
(Singapore) (Tokyo) (Mumbai)
| | |
v v v
+---------+ +---------+ +---------+
| 3 AZs | | 4 AZs | | 3 AZs |
+---------+ +---------+ +---------+
================================================================================

A Region is a physical geographic location where AWS clusters data centers.

Region Architecture:
+------------------------------------------------------------------+
| AWS Region |
| |
| +----------------+ +----------------+ +----------------+ |
| | Availability | | Availability | | Availability | |
| | Zone A (AZ-a) | | Zone B (AZ-b) | | Zone C (AZ-c) | |
| | | | | | | |
| | +----------+ | | +----------+ | | +----------+ | |
| | |Datacenter| | | |Datacenter| | | |Datacenter| | |
| | | DC-1 | | | | DC-3 | | | | DC-5 | | |
| | +----------+ | | +----------+ | | +----------+ | |
| | +----------+ | | +----------+ | | +----------+ | |
| | |Datacenter| | | |Datacenter| | | |Datacenter| | |
| | | DC-2 | | | | DC-4 | | | | DC-6 | | |
| | +----------+ | | +----------+ | | +----------+ | |
| +----------------+ +----------------+ +----------------+ |
| |
| AZs are: |
| - Physically separated (km apart) |
| - Connected via low-latency links |
| - Isolated from failures in other AZs |
+------------------------------------------------------------------+
FactorDescriptionExample
LatencyChoose region closest to usersAsia users -> AP-Southeast-1
CostPrices vary by regionUS-East-1 often cheapest
ComplianceData residency requirementsEU data -> EU-West-1
Service AvailabilityNot all services in all regionsNew services often US first
SLA RequirementsSome regions have better SLAsGovCloud for government

An Availability Zone is one or more discrete data centers with redundant power, networking, and connectivity.

Availability Zone Deep Dive:
+------------------------------------------------------------------+
| Availability Zone Architecture |
| |
| +------------------------------------------------------------+ |
| | Physical Data Center | |
| | | |
| | +-------------+ +-------------+ +-------------+ | |
| | | Power | | Cooling | | Network | | |
| | | Grid A | | System A | | Provider A| | |
| | +-------------+ +-------------+ +-------------+ | |
| | | | | | |
| | v v v | |
| | +----------------------------------------------------+ | |
| | | Redundant Infrastructure | | |
| | +----------------------------------------------------+ | |
| | | | | | |
| | v v v | |
| | +-------------+ +-------------+ +-------------+ | |
| | | Power | | Cooling | | Network | | |
| | | Grid B | | System B | | Provider B| | |
| | +-------------+ +-------------+ +-------------+ | |
| | | |
| | +----------------------------------------------------+ | |
| | | Server Racks (Thousands) | | |
| | | +--------+ +--------+ +--------+ +--------+ | | |
| | | | Rack 1 | | Rack 2 | | Rack 3 | | Rack N | | | |
| | | +--------+ +--------+ +--------+ +--------+ | | |
| | +----------------------------------------------------+ | |
| +------------------------------------------------------------+ |
+------------------------------------------------------------------+
Multi-AZ Deployment Pattern
+------------------------------------------------------------------+
| |
| Internet |
| | |
| v |
| +----------+ |
| |Route 53/ | |
| |CloudFront| |
| +----------+ |
| | |
| v |
| +----------------------------------------------------------------+
| | Application Load Balancer |
| +----------------------------------------------------------------+
| | | | |
| v v v |
| +----------+ +----------+ +----------+ |
| | AZ-A | | AZ-B | | AZ-C | |
| | | | | | | |
| | +------+ | | +------+ | | +------+ | |
| | | EC2 | | | | EC2 | | | | EC2 | | |
| | | App | | | | App | | | | App | | |
| | +------+ | | +------+ | | +------+ | |
| | | | | | | |
| | +------+ | | +------+ | | +------+ | |
| | | RDS |<-------->| | RDS |<-------->| | RDS | |
| | |Primary| | | |Replica| | | |Replica| | |
| | +------+ | | +------+ | | +------+ | |
| +----------+ +----------+ +----------+ |
| |
| Benefits: |
| - Fault tolerance (survive AZ failure) |
| - High availability (99.99% uptime) |
| - Disaster recovery built-in |
+------------------------------------------------------------------+

Edge Locations are endpoints for AWS content delivery network (CloudFront) and DNS (Route 53).

Edge Location Network
+------------------------------------------------------------------+
| |
| AWS Global Network Backbone |
| ============================================================ |
| |
| +-------------+ +-------------+ +-------------+ |
| | Edge Loc 1 | | Edge Loc 2 | | Edge Loc N | |
| | (New York) | | (London) | | (Tokyo) | |
| +------+------+ +------+------+ +------+------+ |
| | | | |
| +--------+----------+--------+----------+ |
| | | |
| v v |
| +-------------+ +-------------+ |
| | Region | | Region | |
| | (us-east-1) | | (eu-west-1) | |
| +-------------+ +-------------+ |
| |
| Edge Locations: |
| - 400+ locations globally |
| - Lower latency for end users |
| - Cache content closer to users |
| - DNS resolution endpoints |
+------------------------------------------------------------------+

AWS Global Network Architecture
+------------------------------------------------------------------+
| |
| AWS Global Network |
| ============================================================ |
| |
| +----------------------------------------------------------+ |
| | Network Backbone | |
| | | |
| | Region A <=======> Region B <=======> Region C | |
| | | | | | |
| | v v v | |
| | +--+--+ +--+--+ +--+--+ | |
| | | VPC | | VPC | | VPC | | |
| | +--+--+ +--+--+ +--+--+ | |
| | | | | | |
| | +--------+----------+--------+----------+ | |
| | | | | |
| | v v | |
| | +-------+ +-------+ | |
| | | Edge | | Edge | | |
| | | Loc 1 | | Loc 2 | | |
| | +-------+ +-------+ | |
| +----------------------------------------------------------+ |
| |
| Features: |
| - Private fiber network |
| - Redundant paths |
| - Low-latency inter-region connectivity |
| - Automatic failover |
+------------------------------------------------------------------+

Global Services (No Region Selection Required)

Section titled “Global Services (No Region Selection Required)”
ServicePurpose
IAMIdentity and Access Management
Route 53DNS Service
CloudFrontContent Delivery Network
WAFWeb Application Firewall
AWS OrganizationsMulti-account management
AWS ShieldDDoS protection

Regional Services (Region Selection Required)

Section titled “Regional Services (Region Selection Required)”
ServicePurpose
EC2Virtual Machines
S3Object Storage (with regional buckets)
RDSRelational Databases
LambdaServerless Computing
VPCVirtual Private Cloud
Service Scope Diagram
+------------------------------------------------------------------+
| |
| Global Services Regional Services |
| +----------------+ +----------------+ |
| | | | | |
| | +----------+ | | Region A | |
| | | IAM | | | +----------+ | |
| | +----------+ | | | EC2 | | |
| | +----------+ | | +----------+ | |
| | | Route 53 | | | +----------+ | |
| | +----------+ | | | RDS | | |
| | +----------+ | | +----------+ | |
| | |CloudFront| | | | |
| | +----------+ | | Region B | |
| | | | +----------+ | |
| | Replicated | | | EC2 | | |
| | Globally | | +----------+ | |
| | | | +----------+ | |
| +----------------+ | | RDS | | |
| | +----------+ | |
| | | |
| +----------------+ |
| |
+------------------------------------------------------------------+

Region Selection Decision Tree
+------------------------------------------------------------------+
| |
| Start: Choose Region |
| | |
| v |
| +---------------------+ |
| | Compliance Required?| |
| +----------+----------+ |
| | |
| +------------+------------+ |
| | | |
| v v |
| (Yes) (No) |
| | | |
| v v |
| +------------------+ +---------------------+ |
| | Select compliant | | Latency Critical? | |
| | region (e.g., | +----------+----------+ |
| | EU for GDPR) | | |
| +------------------+ +---------+---------+ |
| | | |
| v v |
| (Yes) (No) |
| | | |
| v v |
| +------------------+ +------------------+ |
| | Select closest | | Cost Primary | |
| | region to users | | Factor? | |
| +------------------+ +--------+---------+ |
| | |
| +---------+---------+ |
| | | |
| v v |
| (Yes) (No) |
| | | |
| v v |
| +---------------+ +-------------+ |
| | US-East-1 | | Service | |
| | (often lowest)| | Available? | |
| +---------------+ +------+------+ |
| | |
| +------+------+ |
| | | |
| v v |
| (Yes) (No)|
| | | |
| v v |
| +----------+ +----------+
| | Any | | Check |
| | Region | | Service |
| +----------+ | Page |
| +----------+
+------------------------------------------------------------------+

Data Center Physical Security
+------------------------------------------------------------------+
| |
| Layer 1: Perimeter Security |
| +----------------------------------------------------------+ |
| | - Fencing and barriers | |
| | - Security patrols | |
| | - Video surveillance | |
| +----------------------------------------------------------+ |
| | |
| v |
| Layer 2: Building Access |
| +----------------------------------------------------------+ |
| | - Badge readers | |
| | - Biometric scanners | |
| | - Security personnel | |
| +----------------------------------------------------------+ |
| | |
| v |
| Layer 3: Data Center Floor |
| +----------------------------------------------------------+ |
| | - Mantraps (one person at a time) | |
| | - Additional authentication | |
| | - Motion sensors | |
| +----------------------------------------------------------+ |
| | |
| v |
| Layer 4: Equipment Access |
| +----------------------------------------------------------+ |
| | - Locked cabinets | |
| | - Cage enclosures | |
| | - Audit logging | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

1.7 High Availability Architecture Patterns

Section titled “1.7 High Availability Architecture Patterns”
Multi-AZ Architecture
+------------------------------------------------------------------+
| |
| Internet |
| | |
| v |
| +---------------+ |
| | Route 53 | |
| +---------------+ |
| | |
| v |
| +---------------+ |
| | CloudFront | |
| +---------------+ |
| | |
| v |
| +-----------------------------------+ |
| | Application Load Balancer | |
| +-----------------------------------+ |
| | | | |
| v v v |
| +----------+ +----------+ +----------+ |
| | AZ-A | | AZ-B | | AZ-C | |
| | | | | | | |
| | +------+ | | +------+ | | +------+ | |
| | | EC2 | | | | EC2 | | | | EC2 | | |
| | +------+ | | +------+ | | +------+ | |
| | | | | | | |
| | +------+ | | +------+ | | +------+ | |
| | | RDS | | | | RDS | | | | RDS | | |
| | |(Main)| | | |(Stand| | | |(Stand| | |
| | +------+ | | | by) | | | | by) | | |
| | | | +------+ | | +------+ | |
| +----------+ +----------+ +----------+ |
| |
| SLA: 99.99% availability |
+------------------------------------------------------------------+
Multi-Region Architecture
+------------------------------------------------------------------+
| |
| Internet |
| | |
| v |
| +---------------+ |
| | Route 53 | |
| | (Latency-based| |
| | Routing) | |
| +---------------+ |
| / \ |
| / \ |
| v v |
| +---------------+ +---------------+ |
| | US-EAST-1 | | EU-WEST-1 | |
| | (Primary) | | (Secondary) | |
| +---------------+ +---------------+ |
| | | |
| v v |
| +---------------+ +---------------+ |
| | ALB | | ALB | |
| +---------------+ +---------------+ |
| | | |
| v v |
| +---------------+ +---------------+ |
| | EC2 Fleet | | EC2 Fleet | |
| +---------------+ +---------------+ |
| | | |
| v v |
| +---------------+ +---------------+ |
| | RDS Primary | | RDS Read | |
| | | | Replica | |
| +---------------+ +---------------+ |
| | | |
| +--------+-----------+ |
| | |
| v |
| +---------------+ |
| | S3 Cross- | |
| | Region Repl. | |
| +---------------+ |
| |
| SLA: 99.999% availability |
+------------------------------------------------------------------+

ServiceMonthly Uptime SLAAnnual Downtime Allowed
EC299.99%~52 minutes
S399.9%~8.7 hours
RDS Multi-AZ99.95%~4.4 hours
Lambda99.95%~4.4 hours
CloudFront99.9%~8.7 hours
Availability Calculation:
+------------------------------------------------------------------+
| |
| Availability = (Total Time - Downtime) / Total Time |
| |
| Example: 99.99% availability |
| |
| Monthly: 30 days × 24 hours × 60 minutes = 43,200 minutes |
| Allowed Downtime: 43,200 × (1 - 0.9999) = 4.32 minutes |
| |
| Availability Tiers: |
| +--------+----------+------------------+ |
| | Nines | Uptime | Annual Downtime | |
| +--------+----------+------------------+ |
| | 2 | 99% | 3.65 days | |
| | 3 | 99.9% | 8.77 hours | |
| | 4 | 99.99% | 52.60 minutes | |
| | 5 | 99.999% | 5.26 minutes | |
| +--------+----------+------------------+ |
| |
+------------------------------------------------------------------+

Terminal window
# List all available regions
aws ec2 describe-regions --query 'Regions[*].RegionName' --output table
# List Availability Zones in a region
aws ec2 describe-availability-zones \
--region us-east-1 \
--query 'AvailabilityZones[*].ZoneName' \
--output table
# Get current region
aws configure get region
# Set default region
aws configure set region us-west-2
# List edge locations (via CloudFront)
aws cloudfront list-distributions --query 'DistributionList.Items[*].Origins.Items[*].DomainName'
import boto3
# List all regions
ec2 = boto3.client('ec2', region_name='us-east-1')
regions = ec2.describe_regions()
for region in regions['Regions']:
print(f"Region: {region['RegionName']}, Endpoint: {region['Endpoint']}")
# List AZs in a specific region
ec2_us_east_1 = boto3.client('ec2', region_name='us-east-1')
azs = ec2_us_east_1.describe_availability_zones()
for az in azs['AvailabilityZones']:
print(f"AZ: {az['ZoneName']}, State: {az['State']}")

AWS Infrastructure Best Practices
+------------------------------------------------------------------+
| |
| 1. Always deploy across multiple Availability Zones |
| +----------------------------------------------+ |
| | Region | |
| | +--------+ +--------+ +--------+ | |
| | | AZ-A | | AZ-B | | AZ-C | | |
| | | EC2 | | EC2 | | EC2 | | |
| | +--------+ +--------+ +--------+ | |
| +----------------------------------------------+ |
| |
| 2. Choose regions based on: |
| - Latency to end users |
| - Compliance requirements |
| - Cost optimization |
| - Service availability |
| |
| 3. Use CloudFront for global content delivery |
| +----------------------------------------------+ |
| | Users -> Edge Location -> CloudFront -> Origin| |
| +----------------------------------------------+ |
| |
| 4. Implement disaster recovery across regions |
| +----------------------------------------------+ |
| | Primary Region -> Backup Region | |
| | (Active) (Active/Passive) | |
| +----------------------------------------------+ |
| |
| 5. Monitor infrastructure health |
| - Use AWS Health Dashboard |
| - Set up CloudWatch alarms |
| - Subscribe to AWS service alerts |
| |
+------------------------------------------------------------------+

Understanding AWS global infrastructure is not just theoretical knowledge — it directly impacts every decision you make as a DevOps engineer or SRE. Here’s why:

Impact on DevOps/SRE Roles
+------------------------------------------------------------------+
| |
| 1. Deployment Strategy |
| +----------------------------------------------------------+ |
| | Your CI/CD pipeline must know which region to deploy to | |
| | Multi-region deploys require region-aware automation | |
| | AZ-aware deployments are critical for HA | |
| +----------------------------------------------------------+ |
| |
| 2. Incident Response |
| +----------------------------------------------------------+ |
| | When an AZ goes down, you need to understand failover | |
| | Region outages require DR activation procedures | |
| | Edge location issues affect CDN and DNS resolution | |
| +----------------------------------------------------------+ |
| |
| 3. Cost Management |
| +----------------------------------------------------------+ |
| | Data transfer between AZs costs money | |
| | Cross-region replication has bandwidth costs | |
| | Region pricing varies — us-east-1 is often cheapest | |
| +----------------------------------------------------------+ |
| |
| 4. Compliance & Data Residency |
| +----------------------------------------------------------+ |
| | GDPR requires EU data stays in EU regions | |
| | Healthcare (HIPAA) needs specific region configurations | |
| | Government workloads may need GovCloud | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

As a DevOps engineer working from an Arch Linux workstation, here’s how you interact with AWS infrastructure from your terminal:

Terminal window
# Install AWS CLI v2 on Arch Linux
sudo pacman -S aws-cli-v2
# Verify installation
aws --version
# Install additional useful tools
sudo pacman -S jq # JSON processor for AWS CLI output
sudo pacman -S python-boto3 # Python AWS SDK
sudo pacman -S curl # HTTP client for API testing
# Install aws-vault for secure credential management (from AUR)
yay -S aws-vault
# Configure AWS credentials
aws configure
# AWS Access Key ID: AKIAIOSFODNN7EXAMPLE
# AWS Secret Access Key: wJalrXUtnFEMI/...
# Default region name: us-east-1
# Default output format: json
Terminal window
# ~/.aws/config - Managing multiple environments
cat ~/.aws/config
[default]
region = us-east-1
output = json
[profile staging]
region = us-west-2
output = json
[profile production]
region = us-east-1
output = json
role_arn = arn:aws:iam::PROD_ACCOUNT:role/DevOpsRole
source_profile = default
mfa_serial = arn:aws:iam::DEV_ACCOUNT:mfa/your-username
# Use aws-vault for secure credential management
aws-vault exec production -- aws ec2 describe-instances
# Quick region check script (save as ~/bin/aws-region-check.sh)
#!/bin/bash
echo "=== AWS Region Latency Check ==="
for region in us-east-1 us-west-2 eu-west-1 ap-south-1; do
latency=$(curl -s -o /dev/null -w "%{time_total}" \
https://ec2.${region}.amazonaws.com/ping 2>/dev/null)
echo "Region: ${region} - Latency: ${latency}s"
done
/etc/systemd/system/aws-health-check.service
# Use systemd timer for periodic AWS health checks
[Unit]
Description=AWS Infrastructure Health Check
[Service]
Type=oneshot
ExecStart=/usr/local/bin/aws-health-check.sh
User=devops
# /etc/systemd/system/aws-health-check.timer
[Unit]
Description=Run AWS health check every 5 minutes
[Timer]
OnCalendar=*:0/5
Persistent=true
[Install]
WantedBy=timers.target
# Enable the timer
sudo systemctl enable --now aws-health-check.timer
# Check timer status
systemctl list-timers --all | grep aws-health

Useful Linux CLI Patterns for AWS Region Operations

Section titled “Useful Linux CLI Patterns for AWS Region Operations”
Terminal window
# List all regions with their AZ count using jq
aws ec2 describe-regions --query 'Regions[*].RegionName' --output text | \
tr '\t' '\n' | while read region; do
az_count=$(aws ec2 describe-availability-zones \
--region "$region" \
--query 'length(AvailabilityZones)' --output text 2>/dev/null)
echo "$region: $az_count AZs"
done
# Check which services are available in a region
aws ssm get-parameters-by-path \
--path /aws/service/global-infrastructure/regions/us-east-1/services \
--query 'Parameters[*].Value' --output text | tr '\t' '\n' | sort
# Monitor data transfer costs between regions (using CloudWatch)
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name NetworkOut \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Sum \
--region us-east-1

Scenario 1: E-Commerce Platform — Region Selection

Section titled “Scenario 1: E-Commerce Platform — Region Selection”
E-Commerce Multi-Region Setup
+------------------------------------------------------------------+
| |
| Requirements: |
| - Primary customers in India and Southeast Asia |
| - Compliance: Indian data must stay in India (RBI guidelines) |
| - RPO: 15 minutes, RTO: 30 minutes |
| |
| Solution: |
| +----------------------------------------------------------+ |
| | Primary Region: ap-south-1 (Mumbai) | |
| | - All customer data, order processing | |
| | - 3 AZs for high availability | |
| | | |
| | DR Region: ap-southeast-1 (Singapore) | |
| | - Read replicas for databases | |
| | - S3 cross-region replication | |
| | - Warm standby for critical services | |
| | | |
| | Edge: CloudFront with 20+ edge locations in Asia | |
| | - Static assets cached at edge | |
| | - API acceleration via Global Accelerator | |
| +----------------------------------------------------------+ |
| |
| Cost Impact: |
| - Cross-region data transfer: ~$0.09/GB |
| - Multi-AZ RDS: ~30% more than single-AZ |
| - CloudFront: ~$0.085/GB for first 10TB in India |
| |
+------------------------------------------------------------------+

Scenario 2: AZ Failure — Incident Response

Section titled “Scenario 2: AZ Failure — Incident Response”
AZ Failure Response Playbook
+------------------------------------------------------------------+
| |
| Detection: |
| +----------------------------------------------------------+ |
| | 1. CloudWatch alarm fires: EC2 health checks failing | |
| | 2. ALB reports unhealthy targets in AZ-a | |
| | 3. AWS Health Dashboard shows AZ degradation | |
| +----------------------------------------------------------+ |
| | |
| v |
| Immediate Response (Automated): |
| +----------------------------------------------------------+ |
| | 1. ALB automatically routes traffic to healthy AZs | |
| | 2. Auto Scaling launches replacements in other AZs | |
| | 3. RDS fails over to standby (if Multi-AZ) | |
| +----------------------------------------------------------+ |
| | |
| v |
| Manual Verification: |
| +----------------------------------------------------------+ |
| | 1. Verify all services are running in remaining AZs | |
| | 2. Check database failover completed successfully | |
| | 3. Monitor error rates and latency | |
| | 4. Communicate status to stakeholders | |
| +----------------------------------------------------------+ |
| | |
| v |
| Post-Incident: |
| +----------------------------------------------------------+ |
| | 1. Wait for AWS to resolve AZ issue | |
| | 2. Verify instances in affected AZ recover | |
| | 3. Rebalance capacity across all AZs | |
| | 4. Write incident postmortem | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Scenario 3: Global SaaS — Multi-Region Active-Active

Section titled “Scenario 3: Global SaaS — Multi-Region Active-Active”
Active-Active Multi-Region Architecture
+------------------------------------------------------------------+
| |
| Users Worldwide |
| | |
| v |
| +----------------------------+ |
| | Route 53 (Latency-based) | |
| +----------------------------+ |
| | | | |
| v v v |
| US-EAST-1 EU-WEST-1 AP-NORTHEAST-1 |
| (Virginia) (Ireland) (Tokyo) |
| | | | |
| +---------+ +---------+ +---------+ |
| | DynamoDB| | DynamoDB| | DynamoDB| |
| | Global |<-->| Global |<-->| Global | |
| | Table | | Table | | Table | |
| +---------+ +---------+ +---------+ |
| |
| Key Design Decisions: |
| - DynamoDB Global Tables for multi-master writes |
| - Each region handles its local traffic independently |
| - Conflict resolution via last-writer-wins |
| - Regional S3 buckets with cross-region replication |
| |
+------------------------------------------------------------------+

Regional Capacity Planning
+------------------------------------------------------------------+
| |
| Questions to Answer Before Deployment: |
| |
| 1. How many AZs do we need? |
| - Minimum 2 for HA, recommended 3 for production |
| - Cost: ~10-15% more per additional AZ |
| |
| 2. Do we need multi-region? |
| - Latency requirements (>200ms = consider multi-region) |
| - Compliance requirements |
| - DR requirements (RPO/RTO) |
| |
| 3. What's our data transfer budget? |
| - Same AZ: Free |
| - Cross-AZ: $0.01/GB (each direction) |
| - Cross-region: $0.02-0.09/GB |
| - Internet egress: $0.09/GB (first 10TB) |
| |
| 4. Service limits per region: |
| - EC2: 5 Elastic IPs, 20 instances (default) |
| - VPC: 5 VPCs per region (default) |
| - Request limit increases BEFORE you need them |
| |
+------------------------------------------------------------------+
/usr/local/bin/aws-infra-check.sh
# Quick diagnostic script for on-call engineers
#!/bin/bash
set -euo pipefail
REGION=${1:-us-east-1}
echo "=== AWS Infrastructure Health Check — Region: $REGION ==="
echo "Timestamp: $(date -u '+%Y-%m-%d %H:%M:%S UTC')"
echo ""
# Check AWS service health
echo "--- Service Health ---"
aws health describe-events \
--filter '{"regions":["'$REGION'"],"eventStatusCodes":["open","upcoming"]}' \
--query 'events[*].[service,eventTypeCode,statusCode]' \
--output table --region us-east-1 2>/dev/null || echo "No active events"
# Check AZ status
echo ""
echo "--- Availability Zone Status ---"
aws ec2 describe-availability-zones \
--region "$REGION" \
--query 'AvailabilityZones[*].[ZoneName,State,ZoneType]' \
--output table
# Check running instances by AZ
echo ""
echo "--- Instance Distribution by AZ ---"
aws ec2 describe-instances \
--region "$REGION" \
--filters "Name=instance-state-name,Values=running" \
--query 'Reservations[*].Instances[*].[Placement.AvailabilityZone]' \
--output text | sort | uniq -c | sort -rn
# Check recent CloudTrail events for infrastructure changes
echo ""
echo "--- Recent Infrastructure Changes (last 1 hour) ---"
aws cloudtrail lookup-events \
--region "$REGION" \
--start-time $(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%SZ') \
--lookup-attributes AttributeKey=EventName,AttributeValue=RunInstances \
--query 'Events[*].[EventTime,Username,EventName]' \
--output table 2>/dev/null || echo "No recent events"

IssueCauseSolution
Cannot launch instance in AZAZ capacity constraintsTry another AZ in the same region
High latency to regionGeographic distanceUse CloudFront or choose closer region
Service not available in regionNot all services are globalCheck service availability page, use a supported region
Cross-region replication lagNetwork congestionMonitor replication metrics, consider async patterns
Region failover not workingDNS TTL too highReduce Route 53 TTL before DR events
Hitting service limitsDefault quotas reachedRequest limit increase via Service Quotas
Terminal window
# Test connectivity to a specific region endpoint
curl -s -o /dev/null -w "HTTP Status: %{http_code}\nTime: %{time_total}s\n" \
https://ec2.us-east-1.amazonaws.com
# Check DNS resolution for AWS endpoints
dig +short ec2.us-east-1.amazonaws.com
# Test network path to AWS region
traceroute ec2.us-east-1.amazonaws.com
# Check if you can reach specific AZ endpoints
for az in a b c d e f; do
echo -n "us-east-1${az}: "
aws ec2 describe-availability-zones \
--zone-names "us-east-1${az}" \
--query 'AvailabilityZones[0].State' \
--output text --region us-east-1 2>/dev/null || echo "N/A"
done
# Check your current service quotas for a region
aws service-quotas list-service-quotas \
--service-code ec2 \
--region us-east-1 \
--query 'Quotas[?QuotaName==`Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances`].[QuotaName,Value]' \
--output table

Infrastructure Anti-Patterns
+------------------------------------------------------------------+
| |
| ❌ Mistake 1: Single-AZ Deployments in Production |
| +----------------------------------------------------------+ |
| | Problem: All resources in one AZ | |
| | Impact: Complete outage if AZ fails | |
| | Fix: Always deploy across ≥2 AZs for production | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 2: Ignoring Data Transfer Costs |
| +----------------------------------------------------------+ |
| | Problem: Services chatting across AZs/regions | |
| | Impact: Unexpected bills — can be $1000s/month | |
| | Fix: Co-locate dependent services, use VPC endpoints | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 3: Not Requesting Limit Increases Early |
| +----------------------------------------------------------+ |
| | Problem: Hit EC2 instance limits during traffic spike | |
| | Impact: Cannot scale when you need it most | |
| | Fix: Request increases 2-4 weeks before expected growth | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 4: Hardcoding Region/AZ in Application Code |
| +----------------------------------------------------------+ |
| | Problem: Region references hardcoded in configs | |
| | Impact: Cannot fail over or migrate to another region | |
| | Fix: Use instance metadata service or SSM parameters | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 5: Not Testing DR Failover |
| +----------------------------------------------------------+ |
| | Problem: DR plan exists on paper but never tested | |
| | Impact: When disaster strikes, failover doesn't work | |
| | Fix: Schedule quarterly DR drills (Game Days) | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

  1. Q: What’s the difference between a Region, AZ, and Edge Location?

    • A: A Region is a geographic area with multiple AZs. An AZ is one or more data centers with independent power/cooling/networking within a region. Edge Locations are CDN endpoints used by CloudFront and Route 53 for low-latency content delivery — there are 400+ of them vs ~30+ regions.
  2. Q: How do you decide which AWS region to deploy in?

    • A: Consider: (1) Latency to end users, (2) Compliance/data residency requirements, (3) Service availability in that region, (4) Cost — prices vary by region, (5) DR strategy — you may need a secondary region.
  3. Q: What happens when an AZ goes down?

    • A: If properly architected (Multi-AZ), the ALB stops routing to unhealthy targets, Auto Scaling launches instances in healthy AZs, and RDS fails over to standby. For poorly architected systems (single-AZ), it’s a full outage.
  1. Q: Your application has users in India and the US. How would you architect it?

    • A: Use Route 53 latency-based routing to direct users to the nearest region (ap-south-1 for India, us-east-1 for US). Deploy identical application stacks in both regions. Use DynamoDB Global Tables or Aurora Global Database for data replication. Cache static content with CloudFront.
  2. Q: You’re getting intermittent 503 errors during peak hours. Your instances are all in one AZ. What do you do?

    • A: Immediate: Scale out within the current AZ. Short-term: Distribute instances across multiple AZs behind an ALB. Long-term: Implement Auto Scaling with multi-AZ deployment, set up CloudWatch alarms for early warning, and request service limit increases.
  3. Q: How would you estimate data transfer costs for a multi-region deployment?

    • A: Map all data flows: (1) Cross-AZ traffic within each region ($0.01/GB), (2) Cross-region replication ($0.02-0.09/GB), (3) Internet egress (~$0.09/GB for first 10TB), (4) CloudFront egress (varies by edge location). Use AWS Cost Explorer or the Pricing Calculator.

Exam Tip

  1. Regions vs AZs: Regions are geographic areas; AZs are isolated locations within regions
  2. Global Services: IAM, Route 53, CloudFront, WAF are global - no region selection needed
  3. Multi-AZ: Always use multiple AZs for production workloads
  4. SLA Math: Know how to calculate allowed downtime from availability percentage
  5. Edge Locations: Used by CloudFront and Route 53, not for compute

Chapter 2: AWS Account Management & Billing


Last Updated: March 2026