Well_architected
Chapter 5: AWS Well-Architected Framework
Section titled “Chapter 5: AWS Well-Architected Framework”Building Secure, High-Performing, Resilient, and Efficient Infrastructure
Section titled “Building Secure, High-Performing, Resilient, and Efficient Infrastructure”5.1 Overview
Section titled “5.1 Overview”The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS, providing a consistent approach to evaluate architectures.
Well-Architected Framework Pillars+------------------------------------------------------------------+| || +------------------------+ || | Well-Architected | || | Framework | || +------------------------+ || | || +-----------+-----------+-----------+-----------+ || | | | | | || v v v v v || +-------+ +-------+ +-------+ +-------+ +-------+ || |Security| |Reliabil| |Perform-| |Cost | |Sustain-| || | | | ity | | ance | |Optimiz-| | ability| || | | | | | | | ation | | | || +-------+ +-------+ +-------+ +-------+ +-------+ || || 1. Security 2. Reliability 3. Performance || 4. Cost Optimization 5. Sustainability || |+------------------------------------------------------------------+5.2 Pillar 1: Security
Section titled “5.2 Pillar 1: Security”Security Design Principles
Section titled “Security Design Principles” Security Pillar Principles+------------------------------------------------------------------+| || 1. Implement a Strong Identity Foundation || +----------------------------------------------------------+ || | - Centralize identity management | || | - Use IAM for access control | || | - Implement least privilege | || | - Enforce MFA | || +----------------------------------------------------------+ || || 2. Enable Traceability || +----------------------------------------------------------+ || | - Monitor and log all actions | || | - Use CloudTrail for API auditing | || | - Implement alerting | || +----------------------------------------------------------+ || || 3. Apply Security at All Layers || +----------------------------------------------------------+ || | - Defense in depth | || | - Network security (VPC, NACLs, SGs) | || | - Application security | || | - Data encryption | || +----------------------------------------------------------+ || || 4. Automate Security Best Practices || +----------------------------------------------------------+ || | - Use managed services | || | - Automated patching | || | - Security as code | || +----------------------------------------------------------+ || || 5. Protect Data in Transit and at Rest || +----------------------------------------------------------+ || | - Encryption everywhere | || | - TLS for transit | || | - KMS for key management | || +----------------------------------------------------------+ || || 6. Prepare for Security Events || +----------------------------------------------------------+ || | - Incident response plan | || | - Automated response (GuardDuty) | || | - Regular security testing | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Security Architecture
Section titled “Security Architecture” Security Defense in Depth+------------------------------------------------------------------+| || Layer 1: Edge Security || +----------------------------------------------------------+ || | +----------+ +----------+ +----------+ | || | |CloudFront| | WAF | | Shield | | || | | (CDN) | |(Firewall)| | (DDoS) | | || | +----------+ +----------+ +----------+ | || +----------------------------------------------------------+ || | || v || Layer 2: Network Security || +----------------------------------------------------------+ || | +----------+ +----------+ +----------+ | || | | VPC | | NACLs | |Security | | || | | | | | | Groups | | || | +----------+ +----------+ +----------+ | || +----------------------------------------------------------+ || | || v || Layer 3: Compute Security || +----------------------------------------------------------+ || | +----------+ +----------+ +----------+ | || | | EC2 | | Systems | | Guard | | || | | (IAM) | | Manager | | Duty | | || | +----------+ +----------+ +----------+ | || +----------------------------------------------------------+ || | || v || Layer 4: Data Security || +----------------------------------------------------------+ || | +----------+ +----------+ +----------+ | || | | KMS | | S3 | | RDS | | || | |(Encrypt) | |(Encrypt) | |(Encrypt) | | || | +----------+ +----------+ +----------+ | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Security Checklist
Section titled “Security Checklist”| Question | Best Practice | AWS Service |
|---|---|---|
| How are you managing identities? | Centralized IAM, SSO | IAM, AWS SSO |
| How are you controlling access? | Least privilege, MFA | IAM Policies |
| How are you protecting network? | VPC, Security Groups | VPC, NACLs, SGs |
| How are you encrypting data? | Encryption at rest and transit | KMS, ACM |
| How are you monitoring? | Logging, alerting | CloudTrail, CloudWatch |
| How are you responding? | Automated response | GuardDuty, Security Hub |
5.3 Pillar 2: Reliability
Section titled “5.3 Pillar 2: Reliability”Reliability Design Principles
Section titled “Reliability Design Principles” Reliability Pillar Principles+------------------------------------------------------------------+| || 1. Automatically Recover from Failure || +----------------------------------------------------------+ || | - Implement self-healing | || | - Use Auto Scaling | || | - Multi-AZ deployments | || | - Health checks | || +----------------------------------------------------------+ || || 2. Test Recovery Procedures || +----------------------------------------------------------+ || | - Regular disaster recovery testing | || | - Chaos engineering | || | - Game days | || +----------------------------------------------------------+ || || 3. Scale Horizontally || +----------------------------------------------------------+ || | - Distribute load across resources | || | - Avoid single points of failure | || | - Use load balancers | || +----------------------------------------------------------+ || || 4. Stop Guessing Capacity || +----------------------------------------------------------+ || | - Use Auto Scaling | || | - Serverless where possible | || | - Monitor and adjust | || +----------------------------------------------------------+ || || 5. Automate Change Management || +----------------------------------------------------------+ || | - Infrastructure as Code | || | - Automated deployments | || | - Blue/green deployments | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+High Availability Architecture
Section titled “High Availability Architecture” Multi-AZ High Availability+------------------------------------------------------------------+| || Internet || | || v || +---------------+ || | Route 53 | || | (DNS) | || +---------------+ || | || v || +---------------+ || | CloudFront | || | (CDN) | || +---------------+ || | || v || +-----------------------------------+ || | Application Load Balancer | || | (Multi-AZ) | || +-----------------------------------+ || | | | || v v v || +----------+ +----------+ +----------+ || | AZ-A | | AZ-B | | AZ-C | || | | | | | | || | +------+ | | +------+ | | +------+ | || | | EC2 | | | | EC2 | | | | EC2 | | || | | Fleet| | | | Fleet| | | | Fleet| | || | +------+ | | +------+ | | +------+ | || | | | | | | || | +------+ | | +------+ | | +------+ | || | | RDS | | | | RDS | | | | RDS | | || | |Primary| | | |Replica| | | |Replica| | || | +------+ | | +------+ | | +------+ | || +----------+ +----------+ +----------+ || || Availability: 99.99% (52.6 min downtime/year) || |+------------------------------------------------------------------+Disaster Recovery Patterns
Section titled “Disaster Recovery Patterns” Disaster Recovery Strategies+------------------------------------------------------------------+| || Strategy 1: Backup & Restore || +----------------------------------------------------------+ || | RPO: Hours RTO: Hours | || | | || | Primary Region Backup Region | || | +----------+ +----------+ | || | | App | | S3 | | || | | | --backup----> | Backups | | || | | DB | | | | || | +----------+ +----------+ | || | | | || | v (restore) | || | +----------+ | || | | App | | || | | DB | | || | +----------+ | || +----------------------------------------------------------+ || || Strategy 2: Pilot Light || +----------------------------------------------------------+ || | RPO: Minutes RTO: Minutes | || | | || | Primary Region DR Region | || | +----------+ +----------+ | || | | App | | DB | | || | | | --repl------> |(Standby) | | || | | DB | | | | || | +----------+ +----------+ | || | | | || | v (scale up) | || | +----------+ | || | | App | | || | +----------+ | || +----------------------------------------------------------+ || || Strategy 3: Warm Standby || +----------------------------------------------------------+ || | RPO: Minutes RTO: Minutes | || | | || | Primary Region DR Region | || | +----------+ +----------+ | || | | App | | App | | || | | (Full) | --repl------> |(Scaled- | | || | | DB | | down) | | || | +----------+ | DB | | || | +----------+ | || | | | || | v (scale up) | || | +----------+ | || | | App | | || | | (Full) | | || | +----------+ | || +----------------------------------------------------------+ || || Strategy 4: Multi-Region Active-Active || +----------------------------------------------------------+ || | RPO: Real-time RTO: Real-time | || | | || | Region A Region B | || | +----------+ +----------+ | || | | App | | App | | || | | (Active) | <---sync---> | (Active) | | || | | DB | | DB | | || | +----------+ +----------+ | || | | || | Route 53 routes traffic to both regions | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Reliability Metrics
Section titled “Reliability Metrics”| Metric | Definition | Target |
|---|---|---|
| RPO | Recovery Point Objective - max data loss | Minutes to hours |
| RTO | Recovery Time Objective - max downtime | Minutes to hours |
| MTTR | Mean Time To Recovery | Minimize |
| MTTF | Mean Time To Failure | Maximize |
| Availability | Uptime percentage | 99.9% - 99.999% |
5.4 Pillar 3: Performance Efficiency
Section titled “5.4 Pillar 3: Performance Efficiency”Performance Design Principles
Section titled “Performance Design Principles” Performance Pillar Principles+------------------------------------------------------------------+| || 1. Democratize Advanced Technologies || +----------------------------------------------------------+ || | - Use managed services | || | - Let AWS handle complexity | || | - Focus on business logic | || +----------------------------------------------------------+ || || 2. Go Global in Minutes || +----------------------------------------------------------+ || | - Deploy to multiple regions | || | - Use CloudFront for global reach | || | - Edge locations for low latency | || +----------------------------------------------------------+ || || 3. Use Serverless Architectures || +----------------------------------------------------------+ || | - Lambda for compute | || | - DynamoDB for database | || | - S3 for storage | || | - No server management | || +----------------------------------------------------------+ || || 4. Experiment More Often || +----------------------------------------------------------+ || | - Quick provisioning | || | - Test different configurations | || | - A/B testing | || +----------------------------------------------------------+ || || 5. Consider Mechanical Sympathy || +----------------------------------------------------------+ || | - Choose right instance types | || | - Optimize for workload | || | - Use appropriate storage types | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Performance Architecture Patterns
Section titled “Performance Architecture Patterns” Performance Optimization Layers+------------------------------------------------------------------+| || Layer 1: Caching || +----------------------------------------------------------+ || | | || | Client Cache -> CDN Cache -> App Cache -> DB Cache | || | | | | | | || | v v v v | || | Browser CloudFront ElastiCache RDS/DB | || | | || | Benefits: | || | - Reduced latency | || | - Lower database load | || | - Better user experience | || +----------------------------------------------------------+ || || Layer 2: Compute Optimization || +----------------------------------------------------------+ || | | || | Workload Type Recommended Service | || | +----------------+-------------------+ | || | | Web Servers | EC2, ALB, ASG | | || | | API Backend | Lambda, API GW | | || | | Batch Jobs | Batch, Lambda | | || | | Containers | ECS, EKS, Fargate| | || | | ML/AI | SageMaker | | || | +----------------+-------------------+ | || +----------------------------------------------------------+ || || Layer 3: Database Optimization || +----------------------------------------------------------+ || | | || | Data Pattern Recommended Database | || | +----------------+-------------------+ | || | | Relational | RDS, Aurora | | || | | Key-Value | DynamoDB | | || | | Document | DocumentDB | | || | | Graph | Neptune | | || | | Time Series | Timestream | | || | | In-Memory | ElastiCache | | || | +----------------+-------------------+ | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Performance Monitoring
Section titled “Performance Monitoring” Performance Monitoring Stack+------------------------------------------------------------------+| || +------------------------+ || | CloudWatch Dashboard | || +------------------------+ || | || +---------------------+---------------------+ || | | | || v v v || +----------+ +----------+ +----------+ || | Metrics | | Logs | | Traces | || | | | | | | || |CloudWatch| |CloudWatch| | X-Ray | || | Metrics | | Logs | | | || +----------+ +----------+ +----------+ || | | | || v v v || +----------+ +----------+ +----------+ || | Alarms | | Insights | | Service | || | | | | | Map | || +----------+ +----------+ +----------+ || || Key Metrics to Monitor: || +----------------------------------------------------------+ || | - CPU Utilization | || | - Memory Utilization | || | - Disk I/O | || | - Network Throughput | || | - Request Latency | || | - Error Rates | || | - Queue Depth | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+5.5 Pillar 4: Cost Optimization
Section titled “5.5 Pillar 4: Cost Optimization”Cost Design Principles
Section titled “Cost Design Principles” Cost Optimization Pillar Principles+------------------------------------------------------------------+| || 1. Implement Cloud Financial Management || +----------------------------------------------------------+ || | - Establish cost awareness | || | - Set budgets and alerts | || | - Regular cost reviews | || | - FinOps practices | || +----------------------------------------------------------+ || || 2. Adopt a Consumption Model || +----------------------------------------------------------+ || | - Pay for what you use | || | - Scale up and down | || | - No upfront commitments for variable workloads | || +----------------------------------------------------------+ || || 3. Measure Overall Efficiency || +----------------------------------------------------------+ || | - Track business metrics | || | - Cost per transaction | || | - Cost per customer | || +----------------------------------------------------------+ || || 4. Stop Spending Money on Undifferentiated Heavy Lifting || +----------------------------------------------------------+ || | - Use managed services | || | - Focus on competitive advantage | || | - Let AWS manage infrastructure | || +----------------------------------------------------------+ || || 5. Analyze and Attribute Expenditure || +----------------------------------------------------------+ || | - Tag resources | || | - Cost allocation | || | - Chargeback/showback | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Cost Optimization Strategies
Section titled “Cost Optimization Strategies” Cost Optimization Techniques+------------------------------------------------------------------+| || Technique 1: Right-Sizing || +----------------------------------------------------------+ || | | || | Over-provisioned Right-sized | || | +----------------+ +----------------+ | || | | m5.2xlarge | | m5.large | | || | | CPU: 15% | --> | CPU: 60% | | || | | Memory: 20% | | Memory: 70% | | || | | Cost: $280/mo | | Cost: $70/mo | | || | +----------------+ +----------------+ | || | | || | Tools: Compute Optimizer, Cost Explorer | || +----------------------------------------------------------+ || || Technique 2: Reserved Capacity || +----------------------------------------------------------+ || | | || | Pricing Model Discount Commitment | || | +----------------+----------------+-------------+ | || | | On-Demand | 0% | None | | || | | RI (1 year) | 30-40% | 1 year | | || | | RI (3 year) | 50-60% | 3 years | | || | | Savings Plans | Up to 72% | 1-3 years | | || | +----------------+----------------+-------------+ | || +----------------------------------------------------------+ || || Technique 3: Spot Instances || +----------------------------------------------------------+ || | | || | Use Case Spot Discount | || | +----------------+----------------+ | || | | Batch Jobs | Up to 90% off | | || | | CI/CD | Up to 90% off | | || | | Big Data | Up to 90% off | | || | | Containerized | Up to 90% off | | || | +----------------+----------------+ | || +----------------------------------------------------------+ || || Technique 4: Storage Tiering || +----------------------------------------------------------+ || | | || | Data Age Storage Tier Cost | || | +----------------+----------------+-------------+ | || | | Hot (0-30 days) | S3 Standard | $0.023/GB | | || | | Warm (30-90) | S3 Standard-IA | $0.0125/GB | | || | | Cold (90-180) | S3 Glacier | $0.004/GB | | || | | Archive (180+) | S3 Glacier Deep | $0.00099/GB | | || | +----------------+----------------+-------------+ | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+5.6 Pillar 5: Sustainability
Section titled “5.6 Pillar 5: Sustainability”Sustainability Design Principles
Section titled “Sustainability Design Principles” Sustainability Pillar Principles+------------------------------------------------------------------+| || 1. Understand Your Impact || +----------------------------------------------------------+ || | - Measure sustainability metrics | || | - Track carbon footprint | || | - Set improvement goals | || +----------------------------------------------------------+ || || 2. Establish Sustainability Goals || +----------------------------------------------------------+ || | - Define targets | || | - Align with business objectives | || | - Regular reviews | || +----------------------------------------------------------+ || || 3. Maximize Utilization || +----------------------------------------------------------+ || | - Right-size resources | || | - Use serverless | || | - Optimize workload scheduling | || +----------------------------------------------------------+ || || 4. Anticipate and Adopt New Hardware || +----------------------------------------------------------+ || | - Use latest instance generations | || | - Leverage AWS efficiency improvements | || | - Migrate to more efficient services | || +----------------------------------------------------------+ || || 5. Use Managed Services || +----------------------------------------------------------+ || | - AWS manages at scale | || | - Higher efficiency | || | - Shared infrastructure | || +----------------------------------------------------------+ || || 6. Reduce Downstream Impact || +----------------------------------------------------------+ || | - Optimize data transfer | || | - Reduce storage requirements | || | - Efficient algorithms | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Sustainability Best Practices
Section titled “Sustainability Best Practices” Sustainability Optimization+------------------------------------------------------------------+| || Compute Optimization || +----------------------------------------------------------+ || | - Use Graviton (ARM) instances - 60% more efficient | || | - Opt for serverless (Lambda, Fargate) | || | - Use Spot instances for batch workloads | || | - Implement auto-scaling | || +----------------------------------------------------------+ || || Storage Optimization || +----------------------------------------------------------+ || | - Use S3 Intelligent-Tiering | || | - Implement lifecycle policies | || | - Compress data before storage | || | - Delete unused snapshots | || +----------------------------------------------------------+ || || Network Optimization || +----------------------------------------------------------+ || | - Use CloudFront to reduce origin requests | || | - Implement caching | || | - Use VPC endpoints | || | - Optimize data transfer patterns | || +----------------------------------------------------------+ || || Region Selection || +----------------------------------------------------------+ || | - Choose regions with lower carbon intensity | || | - Consider regions powered by renewable energy | || | - Balance latency with sustainability | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+5.7 Well-Architected Tool
Section titled “5.7 Well-Architected Tool”Using the AWS Well-Architected Tool
Section titled “Using the AWS Well-Architected Tool” Well-Architected Tool Workflow+------------------------------------------------------------------+| || Step 1: Define Workload || +----------------------------------------------------------+ || | - Name your workload | || | - Select region | || | - Define scope | || +----------------------------------------------------------+ || | || v || Step 2: Answer Questions || +----------------------------------------------------------+ || | - Answer questions for each pillar | || | - Provide evidence | || | - Note risks and improvements | || +----------------------------------------------------------+ || | || v || Step 3: Review Results || +----------------------------------------------------------+ || | | || | Pillar Scores: | || | +----------------+--------+ | || | | Security | 85/100 | | || | | Reliability | 72/100 | <-- Needs improvement | || | | Performance | 90/100 | | || | | Cost | 65/100 | <-- Needs improvement | || | | Sustainability | 78/100 | | || | +----------------+--------+ | || +----------------------------------------------------------+ || | || v || Step 4: Create Improvement Plan || +----------------------------------------------------------+ || | - Prioritize high-risk items | || | - Create milestones | || | - Track progress | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Sample Questions by Pillar
Section titled “Sample Questions by Pillar”| Pillar | Sample Question |
|---|---|
| Security | How are you protecting access to your workload? |
| Reliability | How does your workload handle failure? |
| Performance | How do you select your compute solution? |
| Cost | Do you have cost controls in place? |
| Sustainability | How do you track and measure sustainability? |
5.8 Architecture Decision Records
Section titled “5.8 Architecture Decision Records”Documenting Architecture Decisions
Section titled “Documenting Architecture Decisions” Architecture Decision Record Template+------------------------------------------------------------------+| || ADR-001: Use Multi-AZ RDS for Database High Availability || || Status: Accepted || || Context: || +----------------------------------------------------------+ || | - Application requires 99.99% availability | || | - Database is critical component | || | - Single AZ deployment has 99.95% availability | || +----------------------------------------------------------+ || || Decision: || +----------------------------------------------------------+ || | - Deploy RDS in Multi-AZ configuration | || | - Use synchronous replication | || | - Automatic failover enabled | || +----------------------------------------------------------+ || || Consequences: || +----------------------------------------------------------+ || | Positive: | || | - Higher availability (99.99%) | || | - Automatic failover | || | - No manual intervention | || | | || | Negative: | || | - Higher cost (~2x single AZ) | || | - Slight write latency increase | || +----------------------------------------------------------+ || || Alternatives Considered: || +----------------------------------------------------------+ || | 1. Single AZ with read replicas - Lower availability | || | 2. Self-managed database - Higher operational overhead | || | 3. Multi-region - Higher cost, complexity | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+5.9 Best Practices Summary
Section titled “5.9 Best Practices Summary” Well-Architected Best Practices+------------------------------------------------------------------+| || 1. Regular Reviews || +----------------------------------------------------------+ || | - Conduct Well-Architected reviews quarterly | || | - Use AWS Well-Architected Tool | || | - Document and track improvements | || +----------------------------------------------------------+ || || 2. Balance Pillars || +----------------------------------------------------------+ || | - Trade-offs between pillars are normal | || | - Document decisions | || | - Align with business requirements | || +----------------------------------------------------------+ || || 3. Iterate || +----------------------------------------------------------+ || | - Architecture evolves over time | || | - Continuous improvement | || | - Learn from incidents | || +----------------------------------------------------------+ || || 4. Automate || +----------------------------------------------------------+ || | - Infrastructure as Code | || | - Automated testing | || | - Automated deployments | || +----------------------------------------------------------+ || || 5. Measure || +----------------------------------------------------------+ || | - Define metrics for each pillar | || | - Set up monitoring and alerting | || | - Regular reporting | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+5.10 Exam Tips
Section titled “5.10 Exam Tips”- Five Pillars: Security, Reliability, Performance, Cost, Sustainability
- Trade-offs: Understand how decisions affect multiple pillars
- Design Principles: Know the principles for each pillar
- Well-Architected Tool: Use for architecture reviews
- RPO/RTO: Know the difference and how they affect DR strategy
- Right-Sizing: Key for both cost and performance optimization
- Defense in Depth: Security approach with multiple layers
- Serverless: Often the best choice for performance and cost
Next Chapter
Section titled “Next Chapter”Chapter 6: Amazon EC2 - Deep Dive
Last Updated: February 2026