Containers on AWS (ECS, EKS, Fargate)
Chapter 9: AWS Elastic Container Services (ECS/EKS)
Section titled “Chapter 9: AWS Elastic Container Services (ECS/EKS)”Running Containerized Applications at Scale
Section titled “Running Containerized Applications at Scale”9.1 Overview
Section titled “9.1 Overview”AWS provides multiple container orchestration services to run Docker containers at scale.
AWS Container Services+------------------------------------------------------------------+| || +------------------------+ || | Container Services | || +------------------------+ || | || +---------------------+---------------------+ || | | | || v v v || +----------+ +----------+ +----------+ || | ECS | | EKS | | Fargate | || | | | | | | || | Amazon's | | Managed | | Serverless| || | Native | |Kubernetes| | Container | || | Container| | Service | | Compute | || | Service | | | | | || +----------+ +----------+ +----------+ || || ECS: Simple, AWS-native container orchestration || EKS: Kubernetes-compatible, portable workloads || Fargate: Serverless compute for both ECS and EKS || |+------------------------------------------------------------------+9.2 Amazon ECS Architecture
Section titled “9.2 Amazon ECS Architecture”ECS Components
Section titled “ECS Components” ECS Core Components+------------------------------------------------------------------+| || +------------------------+ || | ECS Cluster | || +------------------------+ || | || +---------------------+---------------------+ || | | | || v v v || +----------+ +----------+ +----------+ || | Task | | Service | | Container| || | Definition| | | | Instance | || +----------+ +----------+ +----------+ || || Task Definition: Blueprint for containers || Service: Manages running tasks (scaling, load balancing) || Container Instance: EC2 instance running ECS agent || |+------------------------------------------------------------------+ECS Launch Types
Section titled “ECS Launch Types” ECS Launch Types Comparison+------------------------------------------------------------------+| || EC2 Launch Type || +----------------------------------------------------------+ || | | || | +------------------+ | || | | EC2 Instance | | || | | | | || | | +------------+ | +------------+ +------------+ | || | | | Container 1| | | Container 2| | Container 3| | || | | +------------+ | +------------+ +------------+ | || | | | | || | | ECS Agent | | || | +------------------+ | || | | || | You manage: | || | - EC2 instances | || | - Scaling | || | - Security patches | || +----------------------------------------------------------+ || || Fargate Launch Type || +----------------------------------------------------------+ || | | || | +------------------+ | || | | Fargate Task | | || | | | | || | | +------------+ | | || | | | Container | | <-- Single container per task | || | | +------------+ | | || | | | | || | +------------------+ | || | | || | AWS manages: | || | - Infrastructure | || | - Scaling | || | - Security | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Task Definition Structure
Section titled “Task Definition Structure”{ "family": "web-app-task", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "256", "memory": "512", "containerDefinitions": [ { "name": "web-app", "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/web-app:latest", "essential": true, "portMappings": [ { "containerPort": 8080, "protocol": "tcp" } ], "environment": [ {"name": "ENVIRONMENT", "value": "production"} ], "secrets": [ {"name": "DB_PASSWORD", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"} ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/web-app", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" } } } ]}ECS Service Types
Section titled “ECS Service Types” ECS Service Types+------------------------------------------------------------------+| || 1. Replica Service || +----------------------------------------------------------+ || | | || | +------------+ +------------+ +------------+ | || | | Task 1 | | Task 2 | | Task 3 | | || | | (Replica) | | (Replica) | | (Replica) | | || | +------------+ +------------+ +------------+ | || | | || | Use Case: Web servers, APIs | || | Scaling: Based on CPU, memory, or ALB requests | || +----------------------------------------------------------+ || || 2. Daemon Service || +----------------------------------------------------------+ || | | || | EC2 Instance 1 EC2 Instance 2 EC2 Instance 3 | || | +------------+ +------------+ +------------+ | || | | Task | | Task | | Task | | || | | (Daemon) | | (Daemon) | | (Daemon) | | || | +------------+ +------------+ +------------+ | || | | || | Use Case: Logging agents, monitoring agents | || | One task per container instance | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+9.3 Amazon EKS Architecture
Section titled “9.3 Amazon EKS Architecture”EKS Components
Section titled “EKS Components” EKS Architecture+------------------------------------------------------------------+| || +------------------------+ || | EKS Cluster | || +------------------------+ || | || +---------------------+---------------------+ || | | | || v v v || +----------+ +----------+ +----------+ || | Control | | Worker | | Fargate | || | Plane | | Nodes | | Profile | || |(Managed) | | | | | || +----------+ +----------+ +----------+ || || Control Plane: Managed by AWS (API server, etcd) || Worker Nodes: EC2 instances running Kubernetes || Fargate Profile: Serverless Kubernetes pods || |+------------------------------------------------------------------+EKS Node Types
Section titled “EKS Node Types” EKS Node Options+------------------------------------------------------------------+| || 1. Managed Node Groups || +----------------------------------------------------------+ || | | || | Features: | || | - Automated provisioning | || | - Automated updates | || | - Managed by AWS | || | - Can use Spot instances | || | | || | Node Group Configuration: | || | - Instance types | || | - AMI version | || | - Scaling config (min/max/desired) | || | - Labels and taints | || +----------------------------------------------------------+ || || 2. Self-Managed Nodes || +----------------------------------------------------------+ || | | || | Features: | || | - Full control over nodes | || | - Custom AMI | || | - Custom bootstrap scripts | || | - Manual updates | || +----------------------------------------------------------+ || || 3. Fargate Profiles || +----------------------------------------------------------+ || | | || | Features: | || | - Serverless pods | || | - No node management | || | - Per-namespace selection | || | - Higher cost but less overhead | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+EKS Networking
Section titled “EKS Networking” EKS VPC Networking+------------------------------------------------------------------+| || VPC CNI Plugin Architecture || +----------------------------------------------------------+ || | | || | VPC | || | +----------------------------------------------------+ | || | | | | || | | Subnet (10.0.1.0/24) | | || | | +----------+ +----------+ +----------+ | | || | | | Pod IP | | Pod IP | | Pod IP | | | || | | |10.0.1.10 | |10.0.1.11 | |10.0.1.12 | | | || | | +----------+ +----------+ +----------+ | | || | | | | | | | || | | +------+------+------+------+ | | || | | | | | | || | | v v | | || | | +------------------------------------------+ | | || | | | Worker Node (EC2) | | | || | | | 10.0.1.100 | | | || | | | | | | || | | | +--------+ +--------+ +--------+ | | | || | | | | Pod 1 | | Pod 2 | | Pod 3 | | | | || | | | +--------+ +--------+ +--------+ | | | || | | +------------------------------------------+ | | || | +----------------------------------------------------+ | || | | || | Benefits: | || | - Native VPC networking for pods | || | - Security groups per pod | || | - No overlay network overhead | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+9.4 Amazon ECR
Section titled “9.4 Amazon ECR”ECR Architecture
Section titled “ECR Architecture” Elastic Container Registry+------------------------------------------------------------------+| || ECR Repository Structure || +----------------------------------------------------------+ || | | || | Repository: my-app | || | +----------------------------------------------------+ | || | | | | || | | Images: | | || | | my-app:latest (sha256:abc123) | | || | | my-app:v1.0 (sha256:def456) | | || | | my-app:v1.1 (sha256:ghi789) | | || | | my-app@sha256:abc123 | | || | | | | || | +----------------------------------------------------+ | || | | || | Features: | || | - Private repositories | || | - Public repositories (ECR Public) | || | - Image scanning (security) | || | - Cross-region replication | || | - Lifecycle policies | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+ECR Authentication
Section titled “ECR Authentication”# Login to ECRaws ecr get-login-password --region us-east-1 | \ docker login --username AWS --password-stdin \ 123456789012.dkr.ecr.us-east-1.amazonaws.com
# Create repositoryaws ecr create-repository \ --repository-name my-app \ --image-scanning-configuration scanOnPush=true
# Build and push imagedocker build -t my-app .docker tag my-app:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latestdocker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
# Set lifecycle policyaws ecr put-lifecycle-policy \ --repository-name my-app \ --lifecycle-policy-text file://lifecycle-policy.json9.5 Container Best Practices
Section titled “9.5 Container Best Practices”Security Best Practices
Section titled “Security Best Practices” Container Security Checklist+------------------------------------------------------------------+| || 1. Image Security || +----------------------------------------------------------+ || | [ ] Use minimal base images | || | [ ] Scan images for vulnerabilities | || | [ ] Use specific image tags (not :latest) | || | [ ] Sign images | || +----------------------------------------------------------+ || || 2. Runtime Security || +----------------------------------------------------------+ || | [ ] Run as non-root user | || | [ ] Read-only root filesystem | || | [ ] Drop unnecessary capabilities | || | [ ] Use security contexts | || +----------------------------------------------------------+ || || 3. Network Security || +----------------------------------------------------------+ || | [ ] Use security groups | || | [ ] Network policies (EKS) | || | [ ] Service mesh (optional) | || | [ ] Private subnets | || +----------------------------------------------------------+ || || 4. Secrets Management || +----------------------------------------------------------+ || | [ ] Use AWS Secrets Manager | || | [ ] Use Parameter Store | || | [ ] Don't embed secrets in images | || | [ ] Rotate secrets regularly | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Resource Management
Section titled “Resource Management” Container Resource Configuration+------------------------------------------------------------------+| || Task/Container Resources || +----------------------------------------------------------+ || | | || | CPU: | || | - ECS: 0.25 - 16 vCPUs | || | - Kubernetes: millicores (100m = 0.1 CPU) | || | | || | Memory: | || | - ECS: 512 MB - 30 GB | || | - Kubernetes: bytes (Mi, Gi) | || | | || | Best Practices: | || | - Set requests (guaranteed) | || | - Set limits (maximum) | || | - Monitor actual usage | || | - Right-size based on metrics | || +----------------------------------------------------------+ || || Example Task Definition: || +----------------------------------------------------------+ || | { | || | "cpu": "512", // 0.5 vCPU | || | "memory": "1024", // 1 GB | || | "containerDefinitions": [{ | || | "cpu": 256, // Container CPU | || | "memory": 512, // Container memory | || | "memoryReservation": 256 // Soft limit | || | }] | || | } | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+9.6 Practical Configuration
Section titled “9.6 Practical Configuration”ECS with Terraform
Section titled “ECS with Terraform”# ============================================================# ECS Cluster# ============================================================
resource "aws_ecs_cluster" "main" { name = "my-cluster"
setting { name = "containerInsights" value = "enabled" }}
# ============================================================# CloudWatch Log Group# ============================================================
resource "aws_cloudwatch_log_group" "ecs" { name = "/ecs/my-app" retention_in_days = 30}
# ============================================================# ECS Task Definition# ============================================================
resource "aws_ecs_task_definition" "app" { family = "my-app" network_mode = "awsvpc" requires_compatibilities = ["FARGATE"] cpu = 256 memory = 512
container_definitions = jsonencode([ { name = "app" image = "${aws_ecr_repository.app.repository_url}:latest" essential = true
portMappings = [ { containerPort = 8080 protocol = "tcp" } ]
environment = [ { name = "ENVIRONMENT" value = "production" } ]
logConfiguration = { logDriver = "awslogs" options = { "awslogs-group" = aws_cloudwatch_log_group.ecs.name "awslogs-region" = var.region "awslogs-stream-prefix" = "ecs" } } } ])
execution_role_arn = aws_iam_role.ecs_execution.arn task_role_arn = aws_iam_role.ecs_task.arn}
# ============================================================# ECS Service# ============================================================
resource "aws_ecs_service" "app" { name = "my-app" cluster = aws_ecs_cluster.main.id task_definition = aws_ecs_task_definition.app.arn desired_count = 3 launch_type = "FARGATE"
network_configuration { subnets = var.private_subnet_ids security_groups = [aws_security_group.ecs.id] assign_public_ip = false }
load_balancer { target_group_arn = aws_lb_target_group.app.arn container_name = "app" container_port = 8080 }
depends_on = [aws_lb_listener.https]}
# ============================================================# Auto Scaling# ============================================================
resource "aws_appautoscaling_target" "ecs" { max_capacity = 10 min_capacity = 2 resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}" scalable_dimension = "ecs:service:DesiredCount" service_namespace = "ecs"}
resource "aws_appautoscaling_policy" "cpu" { name = "cpu-scaling" policy_type = "TargetTrackingScaling" resource_id = aws_appautoscaling_target.ecs.resource_id scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension service_namespace = aws_appautoscaling_target.ecs.service_namespace
target_tracking_scaling_policy_configuration { target_value = 70
predefined_metric_specification { predefined_metric_type = "ECSServiceAverageCPUUtilization" } }}9.7 Why This Matters in DevOps/SRE
Section titled “9.7 Why This Matters in DevOps/SRE”Containers are the standard deployment unit in modern DevOps. Your choice between ECS, EKS, and Fargate directly impacts operational complexity, team skill requirements, and cost.
Container Orchestration Decision Matrix+------------------------------------------------------------------+| || Decision: ECS vs EKS vs Fargate || || ECS + Fargate (Simplest) || +----------------------------------------------------------+ || | - Small teams, no K8s experience | || | - Simple microservices, internal tools | || | - Minimal operational overhead | || +----------------------------------------------------------+ || || EKS + Managed Nodes (Portable) || +----------------------------------------------------------+ || | - Multi-cloud strategy or K8s expertise on team | || | - Complex service mesh, advanced networking | || | - Rich ecosystem (Helm, Argo, Istio) | || +----------------------------------------------------------+ || || ECS on EC2 (Maximum Control) || +----------------------------------------------------------+ || | - GPU workloads, custom AMIs | || | - Cost optimization with Spot/RIs | || | - High-density task packing | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+9.8 Linux Systems Perspective
Section titled “9.8 Linux Systems Perspective”Container Development from Arch Linux
Section titled “Container Development from Arch Linux”# Install container tools on Arch Linuxsudo pacman -S docker docker-compose kubectl helm jqyay -S aws-cli-v2 copilot-cli
# Enable Dockersudo systemctl enable --now dockersudo usermod -aG docker $USER
# ECR login helperecr-login() { aws ecr get-login-password --region ${1:-us-east-1} | \ docker login --username AWS --password-stdin \ "$(aws sts get-caller-identity --query Account --output text).dkr.ecr.${1:-us-east-1}.amazonaws.com"}
# Build, tag, push workflowdeploy-container() { local app="$1" tag="${2:-latest}" local account=$(aws sts get-caller-identity --query Account --output text) local repo="$account.dkr.ecr.us-east-1.amazonaws.com/$app"
ecr-login docker build -t "$app:$tag" . docker tag "$app:$tag" "$repo:$tag" docker push "$repo:$tag"
# Force new deployment aws ecs update-service \ --cluster production \ --service "$app" \ --force-new-deployment
echo "✅ Deployed $app:$tag"}
# ECS service statusecs-status() { aws ecs describe-services \ --cluster "${1:-production}" \ --services $(aws ecs list-services --cluster "${1:-production}" --query 'serviceArns[*]' --output text) \ --query 'services[*].{Name:serviceName,Desired:desiredCount,Running:runningCount,Status:status}' \ --output table}9.9 Troubleshooting Guide
Section titled “9.9 Troubleshooting Guide”| Issue | Cause | Solution |
|---|---|---|
| Task fails to start | Image pull error | Check ECR permissions, verify image URI |
| Task keeps restarting | Container exits immediately | Check logs: aws logs tail /ecs/my-app |
| Container can’t reach internet | Missing NAT Gateway | Ensure private subnet has NAT GW route |
| Health check failing | Wrong health check path/port | Verify ALB target group health check config |
| Out of memory (OOM kill) | Memory limit too low | Increase task/container memory allocation |
| EKS pods stuck Pending | Insufficient node resources | Scale node group or right-size pods |
# Debug ECS task failures# Check stopped task reasonaws ecs describe-tasks \ --cluster production \ --tasks $(aws ecs list-tasks --cluster production --service-name my-app --desired-status STOPPED --query 'taskArns[0]' --output text) \ --query 'tasks[0].{StopCode:stopCode,StoppedReason:stoppedReason,Containers:containers[*].{Name:name,ExitCode:exitCode,Reason:reason}}'9.10 Interview Questions
Section titled “9.10 Interview Questions”Conceptual Questions
Section titled “Conceptual Questions”-
Q: When would you choose ECS over EKS?
- A: ECS when: (1) team lacks K8s expertise, (2) simpler workloads, (3) tight AWS integration preferred, (4) lower operational overhead desired. EKS when: (1) multi-cloud portability needed, (2) team has K8s skills, (3) need K8s ecosystem (Helm, service mesh), (4) complex networking requirements.
-
Q: Explain the difference between task role and execution role in ECS.
- A: Execution role is used by the ECS agent to pull images from ECR and write logs to CloudWatch. Task role is used by the application code running inside the container to access AWS services (like S3, DynamoDB). Separate roles enforce least privilege.
Scenario-Based Questions
Section titled “Scenario-Based Questions”- Q: Design a CI/CD pipeline for containerized microservices on ECS.
- A: GitHub push → CodeBuild builds Docker image → pushes to ECR → updates ECS task definition with new image tag → CodeDeploy/ECS rolling update deploys new tasks → ALB shifts traffic gradually → CloudWatch monitors error rate → auto-rollback if errors spike.
9.11 Exam Tips
Section titled “9.11 Exam Tips”- ECS vs EKS: ECS is AWS-native, EKS is Kubernetes-compatible
- Fargate: Serverless compute for both ECS and EKS
- Task Definition: Blueprint for containers (CPU, memory, ports)
- Service: Manages tasks, handles scaling and load balancing
- ECR: Container registry with image scanning
- VPC CNI: Pods get real IP addresses in VPC
- IAM Roles: Task roles for AWS API access
- Load Balancing: ALB for ECS services, Ingress for EKS
Next Chapter
Section titled “Next Chapter”Chapter 10: AWS Elastic Beanstalk & App Runner
Last Updated: March 2026