Skip to content

Well_architected

Building Secure, High-Performing, Resilient, and Efficient Infrastructure

Section titled “Building Secure, High-Performing, Resilient, and Efficient Infrastructure”

The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS, providing a consistent approach to evaluate architectures.

Well-Architected Framework Pillars
+------------------------------------------------------------------+
| |
| +------------------------+ |
| | Well-Architected | |
| | Framework | |
| +------------------------+ |
| | |
| +-----------+-----------+-----------+-----------+ |
| | | | | | |
| v v v v v |
| +-------+ +-------+ +-------+ +-------+ +-------+ |
| |Security| |Reliabil| |Perform-| |Cost | |Sustain-| |
| | | | ity | | ance | |Optimiz-| | ability| |
| | | | | | | | ation | | | |
| +-------+ +-------+ +-------+ +-------+ +-------+ |
| |
| 1. Security 2. Reliability 3. Performance |
| 4. Cost Optimization 5. Sustainability |
| |
+------------------------------------------------------------------+

Security Pillar Principles
+------------------------------------------------------------------+
| |
| 1. Implement a Strong Identity Foundation |
| +----------------------------------------------------------+ |
| | - Centralize identity management | |
| | - Use IAM for access control | |
| | - Implement least privilege | |
| | - Enforce MFA | |
| +----------------------------------------------------------+ |
| |
| 2. Enable Traceability |
| +----------------------------------------------------------+ |
| | - Monitor and log all actions | |
| | - Use CloudTrail for API auditing | |
| | - Implement alerting | |
| +----------------------------------------------------------+ |
| |
| 3. Apply Security at All Layers |
| +----------------------------------------------------------+ |
| | - Defense in depth | |
| | - Network security (VPC, NACLs, SGs) | |
| | - Application security | |
| | - Data encryption | |
| +----------------------------------------------------------+ |
| |
| 4. Automate Security Best Practices |
| +----------------------------------------------------------+ |
| | - Use managed services | |
| | - Automated patching | |
| | - Security as code | |
| +----------------------------------------------------------+ |
| |
| 5. Protect Data in Transit and at Rest |
| +----------------------------------------------------------+ |
| | - Encryption everywhere | |
| | - TLS for transit | |
| | - KMS for key management | |
| +----------------------------------------------------------+ |
| |
| 6. Prepare for Security Events |
| +----------------------------------------------------------+ |
| | - Incident response plan | |
| | - Automated response (GuardDuty) | |
| | - Regular security testing | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Security Defense in Depth
+------------------------------------------------------------------+
| |
| Layer 1: Edge Security |
| +----------------------------------------------------------+ |
| | +----------+ +----------+ +----------+ | |
| | |CloudFront| | WAF | | Shield | | |
| | | (CDN) | |(Firewall)| | (DDoS) | | |
| | +----------+ +----------+ +----------+ | |
| +----------------------------------------------------------+ |
| | |
| v |
| Layer 2: Network Security |
| +----------------------------------------------------------+ |
| | +----------+ +----------+ +----------+ | |
| | | VPC | | NACLs | |Security | | |
| | | | | | | Groups | | |
| | +----------+ +----------+ +----------+ | |
| +----------------------------------------------------------+ |
| | |
| v |
| Layer 3: Compute Security |
| +----------------------------------------------------------+ |
| | +----------+ +----------+ +----------+ | |
| | | EC2 | | Systems | | Guard | | |
| | | (IAM) | | Manager | | Duty | | |
| | +----------+ +----------+ +----------+ | |
| +----------------------------------------------------------+ |
| | |
| v |
| Layer 4: Data Security |
| +----------------------------------------------------------+ |
| | +----------+ +----------+ +----------+ | |
| | | KMS | | S3 | | RDS | | |
| | |(Encrypt) | |(Encrypt) | |(Encrypt) | | |
| | +----------+ +----------+ +----------+ | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
QuestionBest PracticeAWS Service
How are you managing identities?Centralized IAM, SSOIAM, AWS SSO
How are you controlling access?Least privilege, MFAIAM Policies
How are you protecting network?VPC, Security GroupsVPC, NACLs, SGs
How are you encrypting data?Encryption at rest and transitKMS, ACM
How are you monitoring?Logging, alertingCloudTrail, CloudWatch
How are you responding?Automated responseGuardDuty, Security Hub

Reliability Pillar Principles
+------------------------------------------------------------------+
| |
| 1. Automatically Recover from Failure |
| +----------------------------------------------------------+ |
| | - Implement self-healing | |
| | - Use Auto Scaling | |
| | - Multi-AZ deployments | |
| | - Health checks | |
| +----------------------------------------------------------+ |
| |
| 2. Test Recovery Procedures |
| +----------------------------------------------------------+ |
| | - Regular disaster recovery testing | |
| | - Chaos engineering | |
| | - Game days | |
| +----------------------------------------------------------+ |
| |
| 3. Scale Horizontally |
| +----------------------------------------------------------+ |
| | - Distribute load across resources | |
| | - Avoid single points of failure | |
| | - Use load balancers | |
| +----------------------------------------------------------+ |
| |
| 4. Stop Guessing Capacity |
| +----------------------------------------------------------+ |
| | - Use Auto Scaling | |
| | - Serverless where possible | |
| | - Monitor and adjust | |
| +----------------------------------------------------------+ |
| |
| 5. Automate Change Management |
| +----------------------------------------------------------+ |
| | - Infrastructure as Code | |
| | - Automated deployments | |
| | - Blue/green deployments | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Multi-AZ High Availability
+------------------------------------------------------------------+
| |
| Internet |
| | |
| v |
| +---------------+ |
| | Route 53 | |
| | (DNS) | |
| +---------------+ |
| | |
| v |
| +---------------+ |
| | CloudFront | |
| | (CDN) | |
| +---------------+ |
| | |
| v |
| +-----------------------------------+ |
| | Application Load Balancer | |
| | (Multi-AZ) | |
| +-----------------------------------+ |
| | | | |
| v v v |
| +----------+ +----------+ +----------+ |
| | AZ-A | | AZ-B | | AZ-C | |
| | | | | | | |
| | +------+ | | +------+ | | +------+ | |
| | | EC2 | | | | EC2 | | | | EC2 | | |
| | | Fleet| | | | Fleet| | | | Fleet| | |
| | +------+ | | +------+ | | +------+ | |
| | | | | | | |
| | +------+ | | +------+ | | +------+ | |
| | | RDS | | | | RDS | | | | RDS | | |
| | |Primary| | | |Replica| | | |Replica| | |
| | +------+ | | +------+ | | +------+ | |
| +----------+ +----------+ +----------+ |
| |
| Availability: 99.99% (52.6 min downtime/year) |
| |
+------------------------------------------------------------------+
Disaster Recovery Strategies
+------------------------------------------------------------------+
| |
| Strategy 1: Backup & Restore |
| +----------------------------------------------------------+ |
| | RPO: Hours RTO: Hours | |
| | | |
| | Primary Region Backup Region | |
| | +----------+ +----------+ | |
| | | App | | S3 | | |
| | | | --backup----> | Backups | | |
| | | DB | | | | |
| | +----------+ +----------+ | |
| | | | |
| | v (restore) | |
| | +----------+ | |
| | | App | | |
| | | DB | | |
| | +----------+ | |
| +----------------------------------------------------------+ |
| |
| Strategy 2: Pilot Light |
| +----------------------------------------------------------+ |
| | RPO: Minutes RTO: Minutes | |
| | | |
| | Primary Region DR Region | |
| | +----------+ +----------+ | |
| | | App | | DB | | |
| | | | --repl------> |(Standby) | | |
| | | DB | | | | |
| | +----------+ +----------+ | |
| | | | |
| | v (scale up) | |
| | +----------+ | |
| | | App | | |
| | +----------+ | |
| +----------------------------------------------------------+ |
| |
| Strategy 3: Warm Standby |
| +----------------------------------------------------------+ |
| | RPO: Minutes RTO: Minutes | |
| | | |
| | Primary Region DR Region | |
| | +----------+ +----------+ | |
| | | App | | App | | |
| | | (Full) | --repl------> |(Scaled- | | |
| | | DB | | down) | | |
| | +----------+ | DB | | |
| | +----------+ | |
| | | | |
| | v (scale up) | |
| | +----------+ | |
| | | App | | |
| | | (Full) | | |
| | +----------+ | |
| +----------------------------------------------------------+ |
| |
| Strategy 4: Multi-Region Active-Active |
| +----------------------------------------------------------+ |
| | RPO: Real-time RTO: Real-time | |
| | | |
| | Region A Region B | |
| | +----------+ +----------+ | |
| | | App | | App | | |
| | | (Active) | <---sync---> | (Active) | | |
| | | DB | | DB | | |
| | +----------+ +----------+ | |
| | | |
| | Route 53 routes traffic to both regions | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
MetricDefinitionTarget
RPORecovery Point Objective - max data lossMinutes to hours
RTORecovery Time Objective - max downtimeMinutes to hours
MTTRMean Time To RecoveryMinimize
MTTFMean Time To FailureMaximize
AvailabilityUptime percentage99.9% - 99.999%

Performance Pillar Principles
+------------------------------------------------------------------+
| |
| 1. Democratize Advanced Technologies |
| +----------------------------------------------------------+ |
| | - Use managed services | |
| | - Let AWS handle complexity | |
| | - Focus on business logic | |
| +----------------------------------------------------------+ |
| |
| 2. Go Global in Minutes |
| +----------------------------------------------------------+ |
| | - Deploy to multiple regions | |
| | - Use CloudFront for global reach | |
| | - Edge locations for low latency | |
| +----------------------------------------------------------+ |
| |
| 3. Use Serverless Architectures |
| +----------------------------------------------------------+ |
| | - Lambda for compute | |
| | - DynamoDB for database | |
| | - S3 for storage | |
| | - No server management | |
| +----------------------------------------------------------+ |
| |
| 4. Experiment More Often |
| +----------------------------------------------------------+ |
| | - Quick provisioning | |
| | - Test different configurations | |
| | - A/B testing | |
| +----------------------------------------------------------+ |
| |
| 5. Consider Mechanical Sympathy |
| +----------------------------------------------------------+ |
| | - Choose right instance types | |
| | - Optimize for workload | |
| | - Use appropriate storage types | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Performance Optimization Layers
+------------------------------------------------------------------+
| |
| Layer 1: Caching |
| +----------------------------------------------------------+ |
| | | |
| | Client Cache -> CDN Cache -> App Cache -> DB Cache | |
| | | | | | | |
| | v v v v | |
| | Browser CloudFront ElastiCache RDS/DB | |
| | | |
| | Benefits: | |
| | - Reduced latency | |
| | - Lower database load | |
| | - Better user experience | |
| +----------------------------------------------------------+ |
| |
| Layer 2: Compute Optimization |
| +----------------------------------------------------------+ |
| | | |
| | Workload Type Recommended Service | |
| | +----------------+-------------------+ | |
| | | Web Servers | EC2, ALB, ASG | | |
| | | API Backend | Lambda, API GW | | |
| | | Batch Jobs | Batch, Lambda | | |
| | | Containers | ECS, EKS, Fargate| | |
| | | ML/AI | SageMaker | | |
| | +----------------+-------------------+ | |
| +----------------------------------------------------------+ |
| |
| Layer 3: Database Optimization |
| +----------------------------------------------------------+ |
| | | |
| | Data Pattern Recommended Database | |
| | +----------------+-------------------+ | |
| | | Relational | RDS, Aurora | | |
| | | Key-Value | DynamoDB | | |
| | | Document | DocumentDB | | |
| | | Graph | Neptune | | |
| | | Time Series | Timestream | | |
| | | In-Memory | ElastiCache | | |
| | +----------------+-------------------+ | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Performance Monitoring Stack
+------------------------------------------------------------------+
| |
| +------------------------+ |
| | CloudWatch Dashboard | |
| +------------------------+ |
| | |
| +---------------------+---------------------+ |
| | | | |
| v v v |
| +----------+ +----------+ +----------+ |
| | Metrics | | Logs | | Traces | |
| | | | | | | |
| |CloudWatch| |CloudWatch| | X-Ray | |
| | Metrics | | Logs | | | |
| +----------+ +----------+ +----------+ |
| | | | |
| v v v |
| +----------+ +----------+ +----------+ |
| | Alarms | | Insights | | Service | |
| | | | | | Map | |
| +----------+ +----------+ +----------+ |
| |
| Key Metrics to Monitor: |
| +----------------------------------------------------------+ |
| | - CPU Utilization | |
| | - Memory Utilization | |
| | - Disk I/O | |
| | - Network Throughput | |
| | - Request Latency | |
| | - Error Rates | |
| | - Queue Depth | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Cost Optimization Pillar Principles
+------------------------------------------------------------------+
| |
| 1. Implement Cloud Financial Management |
| +----------------------------------------------------------+ |
| | - Establish cost awareness | |
| | - Set budgets and alerts | |
| | - Regular cost reviews | |
| | - FinOps practices | |
| +----------------------------------------------------------+ |
| |
| 2. Adopt a Consumption Model |
| +----------------------------------------------------------+ |
| | - Pay for what you use | |
| | - Scale up and down | |
| | - No upfront commitments for variable workloads | |
| +----------------------------------------------------------+ |
| |
| 3. Measure Overall Efficiency |
| +----------------------------------------------------------+ |
| | - Track business metrics | |
| | - Cost per transaction | |
| | - Cost per customer | |
| +----------------------------------------------------------+ |
| |
| 4. Stop Spending Money on Undifferentiated Heavy Lifting |
| +----------------------------------------------------------+ |
| | - Use managed services | |
| | - Focus on competitive advantage | |
| | - Let AWS manage infrastructure | |
| +----------------------------------------------------------+ |
| |
| 5. Analyze and Attribute Expenditure |
| +----------------------------------------------------------+ |
| | - Tag resources | |
| | - Cost allocation | |
| | - Chargeback/showback | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Cost Optimization Techniques
+------------------------------------------------------------------+
| |
| Technique 1: Right-Sizing |
| +----------------------------------------------------------+ |
| | | |
| | Over-provisioned Right-sized | |
| | +----------------+ +----------------+ | |
| | | m5.2xlarge | | m5.large | | |
| | | CPU: 15% | --> | CPU: 60% | | |
| | | Memory: 20% | | Memory: 70% | | |
| | | Cost: $280/mo | | Cost: $70/mo | | |
| | +----------------+ +----------------+ | |
| | | |
| | Tools: Compute Optimizer, Cost Explorer | |
| +----------------------------------------------------------+ |
| |
| Technique 2: Reserved Capacity |
| +----------------------------------------------------------+ |
| | | |
| | Pricing Model Discount Commitment | |
| | +----------------+----------------+-------------+ | |
| | | On-Demand | 0% | None | | |
| | | RI (1 year) | 30-40% | 1 year | | |
| | | RI (3 year) | 50-60% | 3 years | | |
| | | Savings Plans | Up to 72% | 1-3 years | | |
| | +----------------+----------------+-------------+ | |
| +----------------------------------------------------------+ |
| |
| Technique 3: Spot Instances |
| +----------------------------------------------------------+ |
| | | |
| | Use Case Spot Discount | |
| | +----------------+----------------+ | |
| | | Batch Jobs | Up to 90% off | | |
| | | CI/CD | Up to 90% off | | |
| | | Big Data | Up to 90% off | | |
| | | Containerized | Up to 90% off | | |
| | +----------------+----------------+ | |
| +----------------------------------------------------------+ |
| |
| Technique 4: Storage Tiering |
| +----------------------------------------------------------+ |
| | | |
| | Data Age Storage Tier Cost | |
| | +----------------+----------------+-------------+ | |
| | | Hot (0-30 days) | S3 Standard | $0.023/GB | | |
| | | Warm (30-90) | S3 Standard-IA | $0.0125/GB | | |
| | | Cold (90-180) | S3 Glacier | $0.004/GB | | |
| | | Archive (180+) | S3 Glacier Deep | $0.00099/GB | | |
| | +----------------+----------------+-------------+ | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Sustainability Pillar Principles
+------------------------------------------------------------------+
| |
| 1. Understand Your Impact |
| +----------------------------------------------------------+ |
| | - Measure sustainability metrics | |
| | - Track carbon footprint | |
| | - Set improvement goals | |
| +----------------------------------------------------------+ |
| |
| 2. Establish Sustainability Goals |
| +----------------------------------------------------------+ |
| | - Define targets | |
| | - Align with business objectives | |
| | - Regular reviews | |
| +----------------------------------------------------------+ |
| |
| 3. Maximize Utilization |
| +----------------------------------------------------------+ |
| | - Right-size resources | |
| | - Use serverless | |
| | - Optimize workload scheduling | |
| +----------------------------------------------------------+ |
| |
| 4. Anticipate and Adopt New Hardware |
| +----------------------------------------------------------+ |
| | - Use latest instance generations | |
| | - Leverage AWS efficiency improvements | |
| | - Migrate to more efficient services | |
| +----------------------------------------------------------+ |
| |
| 5. Use Managed Services |
| +----------------------------------------------------------+ |
| | - AWS manages at scale | |
| | - Higher efficiency | |
| | - Shared infrastructure | |
| +----------------------------------------------------------+ |
| |
| 6. Reduce Downstream Impact |
| +----------------------------------------------------------+ |
| | - Optimize data transfer | |
| | - Reduce storage requirements | |
| | - Efficient algorithms | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Sustainability Optimization
+------------------------------------------------------------------+
| |
| Compute Optimization |
| +----------------------------------------------------------+ |
| | - Use Graviton (ARM) instances - 60% more efficient | |
| | - Opt for serverless (Lambda, Fargate) | |
| | - Use Spot instances for batch workloads | |
| | - Implement auto-scaling | |
| +----------------------------------------------------------+ |
| |
| Storage Optimization |
| +----------------------------------------------------------+ |
| | - Use S3 Intelligent-Tiering | |
| | - Implement lifecycle policies | |
| | - Compress data before storage | |
| | - Delete unused snapshots | |
| +----------------------------------------------------------+ |
| |
| Network Optimization |
| +----------------------------------------------------------+ |
| | - Use CloudFront to reduce origin requests | |
| | - Implement caching | |
| | - Use VPC endpoints | |
| | - Optimize data transfer patterns | |
| +----------------------------------------------------------+ |
| |
| Region Selection |
| +----------------------------------------------------------+ |
| | - Choose regions with lower carbon intensity | |
| | - Consider regions powered by renewable energy | |
| | - Balance latency with sustainability | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Well-Architected Tool Workflow
+------------------------------------------------------------------+
| |
| Step 1: Define Workload |
| +----------------------------------------------------------+ |
| | - Name your workload | |
| | - Select region | |
| | - Define scope | |
| +----------------------------------------------------------+ |
| | |
| v |
| Step 2: Answer Questions |
| +----------------------------------------------------------+ |
| | - Answer questions for each pillar | |
| | - Provide evidence | |
| | - Note risks and improvements | |
| +----------------------------------------------------------+ |
| | |
| v |
| Step 3: Review Results |
| +----------------------------------------------------------+ |
| | | |
| | Pillar Scores: | |
| | +----------------+--------+ | |
| | | Security | 85/100 | | |
| | | Reliability | 72/100 | <-- Needs improvement | |
| | | Performance | 90/100 | | |
| | | Cost | 65/100 | <-- Needs improvement | |
| | | Sustainability | 78/100 | | |
| | +----------------+--------+ | |
| +----------------------------------------------------------+ |
| | |
| v |
| Step 4: Create Improvement Plan |
| +----------------------------------------------------------+ |
| | - Prioritize high-risk items | |
| | - Create milestones | |
| | - Track progress | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
PillarSample Question
SecurityHow are you protecting access to your workload?
ReliabilityHow does your workload handle failure?
PerformanceHow do you select your compute solution?
CostDo you have cost controls in place?
SustainabilityHow do you track and measure sustainability?

Architecture Decision Record Template
+------------------------------------------------------------------+
| |
| ADR-001: Use Multi-AZ RDS for Database High Availability |
| |
| Status: Accepted |
| |
| Context: |
| +----------------------------------------------------------+ |
| | - Application requires 99.99% availability | |
| | - Database is critical component | |
| | - Single AZ deployment has 99.95% availability | |
| +----------------------------------------------------------+ |
| |
| Decision: |
| +----------------------------------------------------------+ |
| | - Deploy RDS in Multi-AZ configuration | |
| | - Use synchronous replication | |
| | - Automatic failover enabled | |
| +----------------------------------------------------------+ |
| |
| Consequences: |
| +----------------------------------------------------------+ |
| | Positive: | |
| | - Higher availability (99.99%) | |
| | - Automatic failover | |
| | - No manual intervention | |
| | | |
| | Negative: | |
| | - Higher cost (~2x single AZ) | |
| | - Slight write latency increase | |
| +----------------------------------------------------------+ |
| |
| Alternatives Considered: |
| +----------------------------------------------------------+ |
| | 1. Single AZ with read replicas - Lower availability | |
| | 2. Self-managed database - Higher operational overhead | |
| | 3. Multi-region - Higher cost, complexity | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Well-Architected Best Practices
+------------------------------------------------------------------+
| |
| 1. Regular Reviews |
| +----------------------------------------------------------+ |
| | - Conduct Well-Architected reviews quarterly | |
| | - Use AWS Well-Architected Tool | |
| | - Document and track improvements | |
| +----------------------------------------------------------+ |
| |
| 2. Balance Pillars |
| +----------------------------------------------------------+ |
| | - Trade-offs between pillars are normal | |
| | - Document decisions | |
| | - Align with business requirements | |
| +----------------------------------------------------------+ |
| |
| 3. Iterate |
| +----------------------------------------------------------+ |
| | - Architecture evolves over time | |
| | - Continuous improvement | |
| | - Learn from incidents | |
| +----------------------------------------------------------+ |
| |
| 4. Automate |
| +----------------------------------------------------------+ |
| | - Infrastructure as Code | |
| | - Automated testing | |
| | - Automated deployments | |
| +----------------------------------------------------------+ |
| |
| 5. Measure |
| +----------------------------------------------------------+ |
| | - Define metrics for each pillar | |
| | - Set up monitoring and alerting | |
| | - Regular reporting | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Exam Tip

  1. Five Pillars: Security, Reliability, Performance, Cost, Sustainability
  2. Trade-offs: Understand how decisions affect multiple pillars
  3. Design Principles: Know the principles for each pillar
  4. Well-Architected Tool: Use for architecture reviews
  5. RPO/RTO: Know the difference and how they affect DR strategy
  6. Right-Sizing: Key for both cost and performance optimization
  7. Defense in Depth: Security approach with multiple layers
  8. Serverless: Often the best choice for performance and cost

Chapter 6: Amazon EC2 - Deep Dive


Last Updated: February 2026