Amazon EC2 - Deep Dive
Chapter 6: Amazon EC2 - Deep Dive
Section titled “Chapter 6: Amazon EC2 - Deep Dive”Mastering Elastic Compute Cloud for Production Workloads
Section titled “Mastering Elastic Compute Cloud for Production Workloads”6.1 Overview
Section titled “6.1 Overview”Amazon EC2 (Elastic Compute Cloud) provides scalable computing capacity in the AWS cloud, enabling you to deploy virtual servers on demand.
EC2 Core Components+------------------------------------------------------------------+| || +------------------------+ || | Amazon EC2 | || +------------------------+ || | || +-----------+-----------+-----------+-----------+ || | | | | | || v v v v v || +-------+ +-------+ +-------+ +-------+ +-------+ || |Instance| | AMI | |Instance| | Storage| |Network| || | Types | | | | Profile| | | | | || +-------+ +-------+ +-------+ +-------+ +-------+ || || Instance Types: Compute optimization options || AMI: Machine images for launching instances || Instance Profile: IAM roles for instances || Storage: EBS, Instance Store || Network: Security Groups, ENIs, Placement Groups || |+------------------------------------------------------------------+6.2 EC2 Instance Types
Section titled “6.2 EC2 Instance Types”Instance Family Overview
Section titled “Instance Family Overview” EC2 Instance Families+------------------------------------------------------------------+| || Family | Code | Use Case | Example Types || --------|------|----------------------------|-----------------|| General | T3 | Burstable workloads | t3.micro || Purpose| M5 | Balanced performance | m5.xlarge || --------|------|----------------------------|-----------------|| Compute | C5 | High-performance computing | c5.2xlarge || Optimized| C6g | ARM-based compute | c6g.xlarge || --------|------|----------------------------|-----------------|| Memory | R5 | In-memory databases | r5.xlarge || Optimized| X2e | SAP HANA, large databases | x2e.xlarge || --------|------|----------------------------|-----------------|| Storage | I3 | NoSQL, data warehouses | i3.xlarge || Optimized| D3 | HDFS, distributed file | d3.xlarge || --------|------|----------------------------|-----------------|| Accelerated| P4 | ML, HPC | p4d.24xlarge || Computing| G5 | Graphics, video encoding | g5.xlarge || --------|------|----------------------------|-----------------|| Graviton | C6g | ARM-based workloads | c6g.xlarge || (ARM) | M6g | General purpose ARM | m6g.xlarge || |+------------------------------------------------------------------+Instance Type Naming Convention
Section titled “Instance Type Naming Convention” EC2 Instance Naming Convention+------------------------------------------------------------------+| || Example: m5.xlarge || || +---+---+------+ || | m | 5 |xlarge| || +---+---+------+ || | | | || | | +-- Size (resource capacity) || | | nano, micro, small, medium, large, xlarge, || | | 2xlarge, 4xlarge, 8xlarge, 9xlarge, 12xlarge || | | || | +-------- Generation (version) || | || +------------- Instance Family || m = General Purpose || c = Compute Optimized || r = Memory Optimized || i = Storage Optimized || g = GPU Instances || p = HPC/ML Instances || || Special Suffixes: || +----------------------------------------------------------+ || | a - AMD EPYC processor | || | g - Graviton (ARM) processor | || | n - Network optimized | || | d - NVMe storage | || | e - Enhanced (usually more resources) | || | z - High frequency | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Instance Size Comparison
Section titled “Instance Size Comparison” Instance Size Resource Comparison+------------------------------------------------------------------+| || Size vCPUs Memory (GiB) Network Performance || ---------- ------- ------------- ------------------- || nano 1 0.5 Very Low || micro 2 1 Low || small 1 2 Low || medium 2 4 Low to Moderate || large 2 8 Moderate || xlarge 4 16 Moderate || 2xlarge 8 32 High || 4xlarge 16 64 High || 8xlarge 32 128 10 Gigabit || 9xlarge 36 144 10 Gigabit || 12xlarge 48 192 20 Gigabit || 16xlarge 64 256 20 Gigabit || 18xlarge 72 288 25 Gigabit || 24xlarge 96 384 25 Gigabit || 32xlarge 128 512 50 Gigabit || |+------------------------------------------------------------------+6.3 Amazon Machine Images (AMI)
Section titled “6.3 Amazon Machine Images (AMI)”AMI Architecture
Section titled “AMI Architecture” AMI Components+------------------------------------------------------------------+| || +------------------------+ || | AMI | || +------------------------+ || | || +---------------------+---------------------+ || | | | || v v v || +----------+ +----------+ +----------+ || | Root | | Block | | Launch | || | Snapshot | | Device | | Permis- | || | | | Mapping | | sions | || +----------+ +----------+ +----------+ || || Root Snapshot: EBS snapshot of root volume || Block Device Mapping: Volumes attached at launch || Launch Permissions: Who can launch the AMI || |+------------------------------------------------------------------+AMI Sources
Section titled “AMI Sources” AMI Source Options+------------------------------------------------------------------+| || 1. AWS Provided AMIs || +----------------------------------------------------------+ || | - Amazon Linux 2023 | || | - Ubuntu, Debian, CentOS | || | - Windows Server | || | - Free to use, maintained by AWS | || +----------------------------------------------------------+ || || 2. AWS Marketplace AMIs || +----------------------------------------------------------+ || | - Pre-configured software | || | - Paid AMIs (hourly + software cost) | || | - Verified by AWS | || +----------------------------------------------------------+ || || 3. Custom AMIs || +----------------------------------------------------------+ || | - Created from existing instances | || | - Pre-installed software | || | - Organization-specific configurations | || +----------------------------------------------------------+ || || 4. Community AMIs || +----------------------------------------------------------+ || | - Shared by other AWS users | || | - Free to use | || | - Use at your own risk | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Creating Custom AMIs
Section titled “Creating Custom AMIs”# Create AMI from running instanceaws ec2 create-image \ --instance-id i-1234567890abcdef0 \ --name "my-custom-ami-v1" \ --description "Custom AMI with pre-installed software" \ --no-reboot
# Copy AMI to another regionaws ec2 copy-image \ --source-region us-east-1 \ --source-image-id ami-1234567890abcdef0 \ --region us-west-2 \ --name "my-custom-ami-v1-copy"
# Share AMI with another accountaws ec2 modify-image-attribute \ --image-id ami-1234567890abcdef0 \ --launch-permission "Add=[{UserId=123456789012}]"
# Make AMI publicaws ec2 modify-image-attribute \ --image-id ami-1234567890abcdef0 \ --launch-permission "Add=[{Group=all}]"
# Deregister AMIaws ec2 deregister-image --image-id ami-1234567890abcdef06.4 EC2 Instance Lifecycle
Section titled “6.4 EC2 Instance Lifecycle”Instance States
Section titled “Instance States” EC2 Instance Lifecycle+------------------------------------------------------------------+| || +------------------------+ || | Launch | || +------------------------+ || | || v || +------------------------+ || | Pending | || | (Instance booting) | || +------------------------+ || | || v || +------------------------+ || | Running |<----------+ || +------------------------+ | || / \ | || / \ | || v v | || +----------------+ +----------------+ | || | Reboot | | Stop | | || +----------------+ +----------------+ | || | | | || | v | || | +----------------+ | || | | Stopped | | || | +----------------+ | || | | | || | +----------+-----------+ || | | || +--------------------------------+ || || +------------------------+ || | Terminate | || +------------------------+ || | || v || +------------------------+ || | Terminated | || +------------------------+ || |+------------------------------------------------------------------+State Transitions and Billing
Section titled “State Transitions and Billing”| State | Description | Billed | Storage |
|---|---|---|---|
| Pending | Instance launching | No | Preserved |
| Running | Instance active | Yes | Preserved |
| Stopping | Instance stopping | No | Preserved |
| Stopped | Instance stopped | No | Billed for EBS |
| Rebooting | Instance rebooting | Yes | Preserved |
| Terminated | Instance deleted | No | Deleted (EBS optional) |
Stop vs Terminate
Section titled “Stop vs Terminate” Stop vs Terminate Comparison+------------------------------------------------------------------+| || Stop || +----------------------------------------------------------+ || | Pros: | || | - Instance preserved | || | - Can restart later | || | - No compute charges | || | - Can change instance type | || | | || | Cons: | || | - Still pay for EBS storage | || | - Public IP changes | || | - RAM contents lost | || +----------------------------------------------------------+ || || Terminate || +----------------------------------------------------------+ || | Pros: | || | - No more charges | || | - Resources released | || | | || | Cons: | || | - Cannot recover instance | || | - Data lost (unless EBS delete on termination disabled)| || | - Must recreate to use again | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+6.5 EC2 Storage Options
Section titled “6.5 EC2 Storage Options”EBS vs Instance Store
Section titled “EBS vs Instance Store” EC2 Storage Comparison+------------------------------------------------------------------+| || Elastic Block Store (EBS) || +----------------------------------------------------------+ || | | || | Instance EBS Volume | || | +--------+ +--------+ | || | | | | | | || | | |<------>| Data | | || | | | | | | || | +--------+ +--------+ | || | Network attached | || | | || | Features: | || | - Persistent storage | || | - Can detach and reattach | || | - Snapshots to S3 | || | - Encrypted at rest | || | - Can be used after instance termination | || +----------------------------------------------------------+ || || Instance Store || +----------------------------------------------------------+ || | | || | Instance | || | +---------------------------------+ | || | | | Instance Store | | | || | | | +--------+ | | | || | | | | Data | | | | || | | | +--------+ | | | || | +---------------------------------+ | || | Physically attached | || | | || | Features: | || | - Ephemeral storage | || | - Lost on stop/terminate | || | - Very high IOPS | || | - Included in instance price | || | - Cannot detach | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+EBS Volume Types
Section titled “EBS Volume Types” EBS Volume Types+------------------------------------------------------------------+| || Type | Use Case | Max IOPS | Throughput || --------------|-------------------|-------------|-------------|| gp3 | General purpose | 16,000 | 1,000 MB/s || (Default) | Boot volumes | | || --------------|-------------------|-------------|-------------|| io2 Block | Critical workloads | 256,000 | 4,000 MB/s || Express | High-performance | | || --------------|-------------------|-------------|-------------|| io2 | High IOPS | 64,000 | 1,000 MB/s || --------------|-------------------|-------------|-------------|| st1 | Throughput- | 500 per TB | 500 MB/s || (HDD) | optimized | | || --------------|-------------------|-------------|-------------|| sc1 | Cold storage | 80 per TB | 250 MB/s || (HDD) | Infrequent access | | || |+------------------------------------------------------------------+EBS Volume Configuration
Section titled “EBS Volume Configuration”# Create EBS volumeaws ec2 create-volume \ --size 100 \ --volume-type gp3 \ --availability-zone us-east-1a \ --iops 3000 \ --throughput 125
# Attach volume to instanceaws ec2 attach-volume \ --volume-id vol-1234567890abcdef0 \ --instance-id i-1234567890abcdef0 \ --device /dev/sdf
# Create snapshotaws ec2 create-snapshot \ --volume-id vol-1234567890abcdef0 \ --description "Daily backup snapshot"
# Copy snapshot to another regionaws ec2 copy-snapshot \ --source-region us-east-1 \ --source-snapshot-id snap-1234567890abcdef0 \ --region us-west-2
# Modify volume (increase size, change type)aws ec2 modify-volume \ --volume-id vol-1234567890abcdef0 \ --size 200 \ --volume-type gp3 \ --iops 50006.6 EC2 Networking
Section titled “6.6 EC2 Networking”Security Groups
Section titled “Security Groups” Security Group Architecture+------------------------------------------------------------------+| || Security Group = Virtual Firewall for EC2 Instances || || +----------------------------------------------------------+ || | Security Group | || | | || | Inbound Rules: | || | +------------------+------------------+----------------+ | || | | Type | Port | Source | | || | +------------------+------------------+----------------+ | || | | SSH | 22 | 10.0.0.0/8 | | || | | HTTP | 80 | 0.0.0.0/0 | | || | | HTTPS | 443 | 0.0.0.0/0 | | || | | Custom TCP | 8080 | sg-12345678 | | || | +------------------+------------------+----------------+ | || | | || | Outbound Rules: | || | +------------------+------------------+----------------+ | || | | Type | Port | Destination | | || | +------------------+------------------+----------------+ | || | | All Traffic | All | 0.0.0.0/0 | | || | +------------------+------------------+----------------+ | || +----------------------------------------------------------+ || || Key Characteristics: || +----------------------------------------------------------+ || | - STATEFUL: Return traffic automatically allowed | || | - Only ALLOW rules (no deny) | || | - Can reference other security groups | || | - Applied to ENIs (not instances directly) | || | - Up to 5 security groups per ENI | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Security Group vs NACL
Section titled “Security Group vs NACL” Security Group vs Network ACL+------------------------------------------------------------------+| || Security Group | Network ACL || ----------------------------|----------------------------------|| Instance level | Subnet level || Stateful | Stateless || Allow rules only | Allow and Deny rules || All rules evaluated | Rules evaluated in order || No rule number | Rule numbers (1-32766) || Default: Deny all inbound | Default: Allow all || Allow all outbound| Deny all inbound || Associated with ENI | Associated with subnet || |+------------------------------------------------------------------+Elastic Network Interfaces (ENI)
Section titled “Elastic Network Interfaces (ENI)” ENI Architecture+------------------------------------------------------------------+| || EC2 Instance || +----------------------------------------------------------+ || | | || | ENI 0 (Primary) ENI 1 (Secondary) | || | +------------------+ +------------------+ | || | | Primary IPv4: | | Secondary IPv4: | | || | | 10.0.1.10 | | 10.0.1.20 | | || | | | | | | || | | Secondary IPs: | | Secondary IPs: | | || | | 10.0.1.11 | | 10.0.1.21 | | || | | | | | | || | | Elastic IP: | | Elastic IP: | | || | | 54.0.1.100 | | 54.0.1.101 | | || | | | | | | || | | Security Groups: | | Security Groups: | | || | | sg-12345 | | sg-67890 | | || | +------------------+ +------------------+ | || | | || +----------------------------------------------------------+ || || Use Cases for Multiple ENIs: || +----------------------------------------------------------+ || | - Management network (separate from data network) | || | - Network appliances (firewalls, load balancers) | || | - Dual-homed instances | || | - High availability (ENI migration) | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+6.7 Placement Groups
Section titled “6.7 Placement Groups”Placement Group Types
Section titled “Placement Group Types” EC2 Placement Groups+------------------------------------------------------------------+| || 1. Cluster Placement Group || +----------------------------------------------------------+ || | | || | +--------+ +--------+ +--------+ +--------+ | || | |Instance| |Instance| |Instance| |Instance| | || | | 1 | | 2 | | 3 | | 4 | | || | +--------+ +--------+ +--------+ +--------+ | || | \ / \ / | || | \ / \ / | || | +-----+----------+-----+ | || | Low-latency network | || | | || | Use Cases: | || | - HPC applications | || | - Big data processing | || | - Low-latency requirements | || | | || | Benefits: | || | - Highest network throughput | || | - Lowest latency | || | - Same rack placement | || +----------------------------------------------------------+ || || 2. Spread Placement Group || +----------------------------------------------------------+ || | | || | Rack 1 Rack 2 Rack 3 | || | +--------+ +--------+ +--------+ | || | |Instance| |Instance| |Instance| | || | | 1 | | 2 | | 3 | | || | +--------+ +--------+ +--------+ | || | | || | Use Cases: | || | - Critical applications | || | - High availability | || | - Single instance applications | || | | || | Benefits: | || | - Isolated hardware failures | || | - Max 7 instances per group (per AZ) | || +----------------------------------------------------------+ || || 3. Partition Placement Group || +----------------------------------------------------------+ || | | || | Partition 1 Partition 2 Partition 3 | || | +----------+ +----------+ +----------+ | || | |Instance 1| |Instance 4| |Instance 7| | || | |Instance 2| |Instance 5| |Instance 8| | || | |Instance 3| |Instance 6| |Instance 9| | || | +----------+ +----------+ +----------+ | || | | || | Use Cases: | || | - Large distributed systems | || | - Hadoop, Cassandra, Kafka | || | | || | Benefits: | || | - Up to 7 partitions per AZ | || | - Partition-level isolation | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+6.8 EC2 Launch Templates
Section titled “6.8 EC2 Launch Templates”Launch Template Structure
Section titled “Launch Template Structure”{ "LaunchTemplateData": { "ImageId": "ami-0c55b159cbfafe1f0", "InstanceType": "t3.medium", "KeyName": "my-key-pair", "SecurityGroupIds": ["sg-12345678"], "IamInstanceProfile": { "Name": "EC2InstanceProfile" }, "UserData": "IyEvYmluL2Jhc2gKZWNobyAnSGVsbG8gV29ybGQn", "Monitoring": { "Enabled": true }, "BlockDeviceMappings": [ { "DeviceName": "/dev/xvda", "Ebs": { "VolumeSize": 20, "VolumeType": "gp3", "DeleteOnTermination": true } } ], "TagSpecifications": [ { "ResourceType": "instance", "Tags": [ {"Key": "Name", "Value": "WebServer"}, {"Key": "Environment", "Value": "Production"} ] } ], "NetworkInterfaces": [ { "DeviceIndex": 0, "AssociatePublicIpAddress": true, "Groups": ["sg-12345678"] } ] }}Launch Template vs Launch Configuration
Section titled “Launch Template vs Launch Configuration” Launch Template vs Launch Configuration+------------------------------------------------------------------+| || Feature | Launch Template | Launch Configuration || ---------------------|-----------------|----------------------|| Versioning | Yes | No || Spot Instances | Yes | Limited || Multiple Instance | Yes | No || Types | | || T2/T3 Unlimited | Yes | No || Placement Groups | Yes | Yes || Elastic GPU | Yes | No || EBS Optimized | Yes | Yes || Recommended | Yes | No (deprecated) || |+------------------------------------------------------------------+6.9 EC2 Best Practices
Section titled “6.9 EC2 Best Practices”Security Best Practices
Section titled “Security Best Practices” EC2 Security Checklist+------------------------------------------------------------------+| || 1. Access Control || +----------------------------------------------------------+ || | [ ] Use IAM roles instead of access keys | || | [ ] Implement least privilege | || | [ ] Use Systems Manager Session Manager | || | [ ] Disable password-based SSH | || +----------------------------------------------------------+ || || 2. Network Security || +----------------------------------------------------------+ || | [ ] Restrict Security Group ingress | || | [ ] Use VPC endpoints for AWS services | || | [ ] Enable VPC Flow Logs | || | [ ] Use Network ACLs for additional protection | || +----------------------------------------------------------+ || || 3. Instance Security || +----------------------------------------------------------+ || | [ ] Keep OS and packages updated | || | [ ] Use Amazon Inspector for vulnerability scanning | || | [ ] Enable enhanced monitoring | || | [ ] Use Systems Manager for patch management | || +----------------------------------------------------------+ || || 4. Data Security || +----------------------------------------------------------+ || | [ ] Enable EBS encryption | || | [ ] Use KMS for key management | || | [ ] Encrypt data at rest and in transit | || | [ ] Regular snapshots | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Performance Best Practices
Section titled “Performance Best Practices” EC2 Performance Optimization+------------------------------------------------------------------+| || 1. Right-Sizing || +----------------------------------------------------------+ || | - Use CloudWatch metrics to analyze utilization | || | - Use AWS Compute Optimizer recommendations | || | - Consider Graviton instances for cost/performance | || +----------------------------------------------------------+ || || 2. Storage Optimization || +----------------------------------------------------------+ || | - Choose appropriate EBS volume type | || | - Pre-warm volumes (for non-gp3) | || | - Use RAID for higher performance | || | - Consider Instance Store for temporary data | || +----------------------------------------------------------+ || || 3. Network Optimization || +----------------------------------------------------------+ || | - Use Enhanced Networking (ENA) | || | - Use Placement Groups for low latency | || | - Consider Elastic Fabric Adapter (EFA) for HPC | || +----------------------------------------------------------+ || || 4. Monitoring || +----------------------------------------------------------+ || | - Enable detailed monitoring | || | - Set up CloudWatch alarms | || | - Use unified CloudWatch agent | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+6.10 Practical Commands
Section titled “6.10 Practical Commands”Instance Management
Section titled “Instance Management”# Launch instanceaws ec2 run-instances \ --image-id ami-0c55b159cbfafe1f0 \ --count 1 \ --instance-type t3.micro \ --key-name my-key-pair \ --security-group-ids sg-12345678 \ --subnet-id subnet-12345678 \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=WebServer}]'
# Describe instancesaws ec2 describe-instances \ --filters "Name=tag:Name,Values=WebServer" \ --query 'Reservations[*].Instances[*].[InstanceId,State.Name,InstanceType]'
# Start instanceaws ec2 start-instances --instance-ids i-1234567890abcdef0
# Stop instanceaws ec2 stop-instances --instance-ids i-1234567890abcdef0
# Reboot instanceaws ec2 reboot-instances --instance-ids i-1234567890abcdef0
# Terminate instanceaws ec2 terminate-instances --instance-ids i-1234567890abcdef0
# Modify instance type (stopped instance)aws ec2 modify-instance-attribute \ --instance-id i-1234567890abcdef0 \ --instance-type '{"Value": "t3.small"}'Security Group Management
Section titled “Security Group Management”# Create security groupaws ec2 create-security-group \ --group-name my-security-group \ --description "My security group" \ --vpc-id vpc-12345678
# Add inbound ruleaws ec2 authorize-security-group-ingress \ --group-id sg-12345678 \ --protocol tcp \ --port 22 \ --cidr 10.0.0.0/8
# Add rule referencing another security groupaws ec2 authorize-security-group-ingress \ --group-id sg-12345678 \ --protocol tcp \ --port 8080 \ --source-group sg-87654321
# Remove ruleaws ec2 revoke-security-group-ingress \ --group-id sg-12345678 \ --protocol tcp \ --port 22 \ --cidr 10.0.0.0/86.11 Why This Matters in DevOps/SRE
Section titled “6.11 Why This Matters in DevOps/SRE”EC2 is the bread and butter of AWS compute. Even in a serverless world, most production workloads still run on EC2 instances managed by DevOps teams.
EC2 in DevOps Daily Work+------------------------------------------------------------------+| || Daily EC2 Management Tasks: || || 1. Instance Fleet Management || +----------------------------------------------------------+ || | - Right-sizing instances based on CloudWatch metrics | || | - Patching AMIs and rolling updates | || | - Managing Spot fleet for CI/CD runners | || +----------------------------------------------------------+ || || 2. Performance Troubleshooting || +----------------------------------------------------------+ || | - CPU steal time on shared tenancy | || | - EBS throughput bottlenecks | || | - Instance store vs EBS IOPS comparison | || +----------------------------------------------------------+ || || 3. Security Operations || +----------------------------------------------------------+ || | - Security group auditing | || | - Key pair rotation | || | - IMDSv2 enforcement | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+6.12 Linux Systems Perspective
Section titled “6.12 Linux Systems Perspective”EC2 Fleet Management from Arch Linux
Section titled “EC2 Fleet Management from Arch Linux”# Install EC2 management tools on Arch Linuxsudo pacman -S aws-cli-v2 jq opensshyay -S ssm-session-manager-plugin
# Quick fleet status script#!/bin/bash# ~/bin/ec2-fleet-status.shset -euo pipefail
REGION=${1:-us-east-1}echo "=== EC2 Fleet Status — $REGION ==="echo ""
# List all running instances with useful infoaws ec2 describe-instances \ --region "$REGION" \ --filters Name=instance-state-name,Values=running \ --query 'Reservations[*].Instances[*].[ InstanceId, InstanceType, State.Name, PrivateIpAddress, PublicIpAddress, Tags[?Key==`Name`].Value | [0], LaunchTime ]' \ --output table
# Instance type distributionecho ""echo "--- Instance Type Distribution ---"aws ec2 describe-instances \ --region "$REGION" \ --filters Name=instance-state-name,Values=running \ --query 'Reservations[*].Instances[*].InstanceType' \ --output text | tr '\t' '\n' | sort | uniq -c | sort -rn
# SSH via SSM (no SSH keys needed, no public IP needed)# aws ssm start-session --target i-0123456789abcdef0
# Interactive instance selector for SSHec2connect() { local instance=$(aws ec2 describe-instances \ --filters Name=instance-state-name,Values=running \ --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value|[0]]' \ --output text | fzf --prompt="Select instance: ") local id=$(echo "$instance" | awk '{print $1}') echo "Connecting to $id via SSM..." aws ssm start-session --target "$id"}Automated AMI Baking
Section titled “Automated AMI Baking”#!/bin/bash# /usr/local/bin/bake-ami.sh - Golden AMI pipelineset -euo pipefail
SOURCE_AMI=${1:?"Usage: $0 <source-ami-id>"}NAME="golden-ami-$(date +%Y%m%d-%H%M%S)"
echo "=== Baking Golden AMI ==="echo "Source: $SOURCE_AMI"
# Launch temporary instanceINSTANCE_ID=$(aws ec2 run-instances \ --image-id "$SOURCE_AMI" \ --instance-type t3.medium \ --key-name build-key \ --security-group-ids sg-build \ --subnet-id subnet-build \ --query 'Instances[0].InstanceId' \ --output text)
echo "Build instance: $INSTANCE_ID"aws ec2 wait instance-running --instance-ids "$INSTANCE_ID"
# Apply patches (via SSM Run Command)aws ssm send-command \ --instance-ids "$INSTANCE_ID" \ --document-name "AWS-RunShellScript" \ --parameters commands=["sudo yum update -y","sudo yum install -y cloudwatch-agent"] \ --comment "Patch golden AMI"
sleep 60 # Wait for patching
# Create AMIAMI_ID=$(aws ec2 create-image \ --instance-id "$INSTANCE_ID" \ --name "$NAME" \ --no-reboot \ --query 'ImageId' \ --output text)
echo "New AMI: $AMI_ID"
# Cleanup build instanceaws ec2 terminate-instances --instance-ids "$INSTANCE_ID"
echo "✅ AMI $AMI_ID created successfully"6.13 Troubleshooting Guide
Section titled “6.13 Troubleshooting Guide”| Issue | Cause | Solution |
|---|---|---|
| Instance won’t start | Insufficient capacity in AZ | Try different AZ or instance type |
| SSH connection timeout | Security group / NACL blocking | Check port 22 in SG and NACL |
| High CPU steal time | Noisy neighbor on shared tenancy | Use dedicated or bare metal instance |
| Instance unreachable | Public IP not assigned | Check subnet auto-assign or use EIP |
| EBS volume slow | Throughput/IOPS limit reached | Upgrade volume type (gp2→gp3, io1→io2) |
| IMDSv1 security concern | Legacy metadata service | Enforce IMDSv2 via launch template |
# Debug EC2 connectivity issues# Check security group rulesaws ec2 describe-security-groups \ --group-ids sg-12345678 \ --query 'SecurityGroups[0].IpPermissions' \ --output table
# Check instance system/instance statusaws ec2 describe-instance-status \ --instance-ids i-12345678 \ --output table
# View serial console output (boot issues)aws ec2 get-console-output \ --instance-id i-12345678 \ --latest --output text6.14 Common Mistakes & Anti-Patterns
Section titled “6.14 Common Mistakes & Anti-Patterns” EC2 Anti-Patterns+------------------------------------------------------------------+| || ❌ Mistake 1: Running Everything on EC2 || +----------------------------------------------------------+ || | Problem: Using EC2 for batch jobs, APIs, cron jobs | || | Impact: Overpaying, managing unnecessary infrastructure | || | Fix: Use Lambda, Fargate, or Step Functions where apt | || +----------------------------------------------------------+ || || ❌ Mistake 2: Not Using IMDSv2 || +----------------------------------------------------------+ || | Problem: IMDSv1 vulnerable to SSRF attacks | || | Impact: Credential theft via metadata service | || | Fix: Require IMDSv2 with HttpTokens=required | || +----------------------------------------------------------+ || || ❌ Mistake 3: Manual Instance Configuration || +----------------------------------------------------------+ || | Problem: SSH in and install packages manually | || | Impact: Snowflake servers, config drift | || | Fix: Use AMIs, user data, or config management | || +----------------------------------------------------------+ || || ❌ Mistake 4: Over-provisioning Instance Types || +----------------------------------------------------------+ || | Problem: Running m5.2xlarge when t3.medium suffices | || | Impact: 4-8x cost increase for unused capacity | || | Fix: Use Compute Optimizer, monitor CPU/memory | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+6.15 Interview Questions
Section titled “6.15 Interview Questions”Conceptual Questions
Section titled “Conceptual Questions”-
Q: Explain the difference between Stop, Hibernate, and Terminate for EC2.
- A: Stop: Instance shuts down, EBS root volume preserved, public IP released (unless EIP), no charges for compute. Hibernate: RAM contents saved to EBS, faster restart, must be enabled at launch. Terminate: Instance deleted, root EBS deleted (by default), data lost permanently.
-
Q: When would you choose a Spot instance vs On-Demand?
- A: Spot for fault-tolerant, flexible workloads: CI/CD builds, batch processing, data analysis, dev/staging environments. On-Demand for production workloads that can’t tolerate interruption, or short-term unpredictable workloads. Spot saves up to 90% but can be reclaimed with 2-minute notice.
-
Q: How does IMDSv2 improve security over IMDSv1?
- A: IMDSv2 requires a session token obtained via a PUT request, which prevents SSRF attacks. IMDSv1 uses a simple GET request that an attacker can forge via SSRF. IMDSv2 also has a hop limit (default 1) that prevents requests from containers from accessing host metadata.
Scenario-Based Questions
Section titled “Scenario-Based Questions”- Q: Your EC2 instances are experiencing intermittent high latency. How do you investigate?
- A: (1) Check CloudWatch for CPU, network, and disk metrics, (2) Look for CPU steal time (indicates noisy neighbor), (3) Check EBS CloudWatch metrics for IOPS/throughput limits, (4) Verify network bandwidth against instance type limits, (5) Check for micro-bursting with enhanced monitoring, (6) Consider switching to a dedicated or larger instance type.
6.16 Exam Tips
Section titled “6.16 Exam Tips”- Instance Types: Know the families (T, M, C, R, I, G, P) and their use cases
- Storage: EBS is persistent, Instance Store is ephemeral
- Security Groups: Stateful, allow rules only, default deny inbound
- Placement Groups: Cluster (HPC), Spread (HA), Partition (distributed)
- ENI: Can attach multiple ENIs, can migrate between instances
- AMI: Can share across accounts, copy across regions
- Stop vs Terminate: Stop preserves, terminate deletes
- Launch Templates: Preferred over Launch Configurations
Next Chapter
Section titled “Next Chapter”Chapter 7: Auto Scaling & Load Balancing
Last Updated: March 2026
Last Updated: February 2026