Well_architected

Chapter 5: AWS Well-Architected Framework

Building Secure, High-Performing, Resilient, and Efficient Infrastructure

5.1 Overview

The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS, providing a consistent approach to evaluate architectures.

                    Well-Architected Framework Pillars
+------------------------------------------------------------------+
|                                                                   |
|                    +------------------------+                     |
|                    |  Well-Architected      |                     |
|                    |      Framework         |                     |
|                    +------------------------+                     |
|                              |                                    |
|     +-----------+-----------+-----------+-----------+            |
|     |           |           |           |           |            |
|     v           v           v           v           v            |
| +-------+  +-------+  +-------+  +-------+  +-------+          |
| |Security|  |Reliabil|  |Perform-|  |Cost    |  |Sustain-|          |
| |        |  |  ity   |  | ance  |  |Optimiz-|  | ability|          |
| |        |  |        |  |        |  | ation  |  |        |          |
| +-------+  +-------+  +-------+  +-------+  +-------+          |
|                                                                   |
|    1. Security         2. Reliability      3. Performance        |
|    4. Cost Optimization                   5. Sustainability     |
|                                                                   |
+------------------------------------------------------------------+

5.2 Pillar 1: Security

Security Design Principles

                    Security Pillar Principles
+------------------------------------------------------------------+
|                                                                   |
|    1. Implement a Strong Identity Foundation                      |
|    +----------------------------------------------------------+   |
|    |  - Centralize identity management                         |   |
|    |  - Use IAM for access control                             |   |
|    |  - Implement least privilege                              |   |
|    |  - Enforce MFA                                            |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    2. Enable Traceability                                        |
|    +----------------------------------------------------------+   |
|    |  - Monitor and log all actions                            |   |
|    |  - Use CloudTrail for API auditing                        |   |
|    |  - Implement alerting                                     |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    3. Apply Security at All Layers                               |
|    +----------------------------------------------------------+   |
|    |  - Defense in depth                                       |   |
|    |  - Network security (VPC, NACLs, SGs)                     |   |
|    |  - Application security                                   |   |
|    |  - Data encryption                                        |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    4. Automate Security Best Practices                           |
|    +----------------------------------------------------------+   |
|    |  - Use managed services                                   |   |
|    |  - Automated patching                                      |   |
|    |  - Security as code                                       |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    5. Protect Data in Transit and at Rest                        |
|    +----------------------------------------------------------+   |
|    |  - Encryption everywhere                                  |   |
|    |  - TLS for transit                                        |   |
|    |  - KMS for key management                                 |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    6. Prepare for Security Events                                |
|    +----------------------------------------------------------+   |
|    |  - Incident response plan                                 |   |
|    |  - Automated response (GuardDuty)                         |   |
|    |  - Regular security testing                                |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Security Architecture

                    Security Defense in Depth
+------------------------------------------------------------------+
|                                                                   |
|    Layer 1: Edge Security                                         |
|    +----------------------------------------------------------+   |
|    |  +----------+  +----------+  +----------+                |   |
|    |  |CloudFront|  |   WAF    |  |  Shield  |                |   |
|    |  |  (CDN)   |  |(Firewall)|  |  (DDoS)  |                |   |
|    |  +----------+  +----------+  +----------+                |   |
|    +----------------------------------------------------------+   |
|                              |                                    |
|                              v                                    |
|    Layer 2: Network Security                                      |
|    +----------------------------------------------------------+   |
|    |  +----------+  +----------+  +----------+                |   |
|    |  |   VPC    |  |  NACLs   |  |Security  |                |   |
|    |  |          |  |          |  | Groups   |                |   |
|    |  +----------+  +----------+  +----------+                |   |
|    +----------------------------------------------------------+   |
|                              |                                    |
|                              v                                    |
|    Layer 3: Compute Security                                      |
|    +----------------------------------------------------------+   |
|    |  +----------+  +----------+  +----------+                |   |
|    |  |   EC2    |  | Systems  |  |  Guard   |                |   |
|    |  |  (IAM)   |  | Manager  |  |  Duty    |                |   |
|    |  +----------+  +----------+  +----------+                |   |
|    +----------------------------------------------------------+   |
|                              |                                    |
|                              v                                    |
|    Layer 4: Data Security                                         |
|    +----------------------------------------------------------+   |
|    |  +----------+  +----------+  +----------+                |   |
|    |  |   KMS    |  |   S3     |  |   RDS    |                |   |
|    |  |(Encrypt) |  |(Encrypt) |  |(Encrypt) |                |   |
|    |  +----------+  +----------+  +----------+                |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Security Checklist

Question	Best Practice	AWS Service
How are you managing identities?	Centralized IAM, SSO	IAM, AWS SSO
How are you controlling access?	Least privilege, MFA	IAM Policies
How are you protecting network?	VPC, Security Groups	VPC, NACLs, SGs
How are you encrypting data?	Encryption at rest and transit	KMS, ACM
How are you monitoring?	Logging, alerting	CloudTrail, CloudWatch
How are you responding?	Automated response	GuardDuty, Security Hub

5.3 Pillar 2: Reliability

Reliability Design Principles

                    Reliability Pillar Principles
+------------------------------------------------------------------+
|                                                                   |
|    1. Automatically Recover from Failure                          |
|    +----------------------------------------------------------+   |
|    |  - Implement self-healing                                |   |
|    |  - Use Auto Scaling                                       |   |
|    |  - Multi-AZ deployments                                   |   |
|    |  - Health checks                                          |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    2. Test Recovery Procedures                                    |
|    +----------------------------------------------------------+   |
|    |  - Regular disaster recovery testing                      |   |
|    |  - Chaos engineering                                      |   |
|    |  - Game days                                              |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    3. Scale Horizontally                                          |
|    +----------------------------------------------------------+   |
|    |  - Distribute load across resources                       |   |
|    |  - Avoid single points of failure                         |   |
|    |  - Use load balancers                                     |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    4. Stop Guessing Capacity                                      |
|    +----------------------------------------------------------+   |
|    |  - Use Auto Scaling                                        |   |
|    |  - Serverless where possible                              |   |
|    |  - Monitor and adjust                                     |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    5. Automate Change Management                                  |
|    +----------------------------------------------------------+   |
|    |  - Infrastructure as Code                               |   |
|    |  - Automated deployments                                   |   |
|    |  - Blue/green deployments                                 |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

High Availability Architecture

                    Multi-AZ High Availability
+------------------------------------------------------------------+
|                                                                   |
|                         Internet                                  |
|                            |                                      |
|                            v                                      |
|                    +---------------+                              |
|                    |  Route 53     |                              |
|                    |  (DNS)        |                              |
|                    +---------------+                              |
|                            |                                      |
|                            v                                      |
|                    +---------------+                              |
|                    |  CloudFront   |                              |
|                    |  (CDN)        |                              |
|                    +---------------+                              |
|                            |                                      |
|                            v                                      |
|            +-----------------------------------+                  |
|            |      Application Load Balancer    |                  |
|            |         (Multi-AZ)               |                  |
|            +-----------------------------------+                  |
|                |               |               |                  |
|                v               v               v                  |
|         +----------+    +----------+    +----------+              |
|         |   AZ-A   |    |   AZ-B   |    |   AZ-C   |              |
|         |          |    |          |    |          |              |
|         | +------+ |    | +------+ |    | +------+ |              |
|         | | EC2  | |    | | EC2  | |    | | EC2  | |              |
|         | | Fleet| |    | | Fleet| |    | | Fleet| |              |
|         | +------+ |    | +------+ |    | +------+ |              |
|         |          |    |          |    |          |              |
|         | +------+ |    | +------+ |    | +------+ |              |
|         | | RDS  | |    | | RDS  | |    | | RDS  | |              |
|         | |Primary| |   | |Replica| |   | |Replica| |              |
|         | +------+ |    | +------+ |    | +------+ |              |
|         +----------+    +----------+    +----------+              |
|                                                                   |
|        Availability: 99.99% (52.6 min downtime/year)             |
|                                                                   |
+------------------------------------------------------------------+

Disaster Recovery Patterns

                    Disaster Recovery Strategies
+------------------------------------------------------------------+
|                                                                   |
|    Strategy 1: Backup & Restore                                   |
|    +----------------------------------------------------------+   |
|    |  RPO: Hours              RTO: Hours                       |   |
|    |                                                          |   |
|    |  Primary Region              Backup Region                |   |
|    |  +----------+               +----------+                 |   |
|    |  |   App    |               |   S3     |                 |   |
|    |  |          | --backup----> | Backups  |                 |   |
|    |  |   DB     |               |          |                 |   |
|    |  +----------+               +----------+                 |   |
|    |                                   |                       |   |
|    |                                   v (restore)             |   |
|    |                            +----------+                   |   |
|    |                            |   App    |                   |   |
|    |                            |   DB     |                   |   |
|    |                            +----------+                   |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Strategy 2: Pilot Light                                        |
|    +----------------------------------------------------------+   |
|    |  RPO: Minutes            RTO: Minutes                     |   |
|    |                                                          |   |
|    |  Primary Region              DR Region                    |   |
|    |  +----------+               +----------+                 |   |
|    |  |   App    |               |   DB     |                 |   |
|    |  |          | --repl------> |(Standby) |                 |   |
|    |  |   DB     |               |          |                 |   |
|    |  +----------+               +----------+                 |   |
|    |                                   |                       |   |
|    |                                   v (scale up)            |   |
|    |                            +----------+                   |   |
|    |                            |   App    |                   |   |
|    |                            +----------+                   |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Strategy 3: Warm Standby                                       |
|    +----------------------------------------------------------+   |
|    |  RPO: Minutes            RTO: Minutes                     |   |
|    |                                                          |   |
|    |  Primary Region              DR Region                    |   |
|    |  +----------+               +----------+                 |   |
|    |  |   App    |               |   App    |                 |   |
|    |  | (Full)   | --repl------> |(Scaled-  |                 |   |
|    |  |   DB     |               | down)    |                 |   |
|    |  +----------+               |   DB     |                 |   |
|    |                             +----------+                 |   |
|    |                                   |                       |   |
|    |                                   v (scale up)            |   |
|    |                            +----------+                   |   |
|    |                            |   App    |                   |   |
|    |                            | (Full)   |                   |   |
|    |                            +----------+                   |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Strategy 4: Multi-Region Active-Active                        |
|    +----------------------------------------------------------+   |
|    |  RPO: Real-time         RTO: Real-time                    |   |
|    |                                                          |   |
|    |  Region A                    Region B                     |   |
|    |  +----------+               +----------+                 |   |
|    |  |   App    |               |   App    |                 |   |
|    |  | (Active) | <---sync---> | (Active) |                 |   |
|    |  |   DB     |               |   DB     |                 |   |
|    |  +----------+               +----------+                 |   |
|    |                                                          |   |
|    |  Route 53 routes traffic to both regions                 |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Reliability Metrics

Metric	Definition	Target
RPO	Recovery Point Objective - max data loss	Minutes to hours
RTO	Recovery Time Objective - max downtime	Minutes to hours
MTTR	Mean Time To Recovery	Minimize
MTTF	Mean Time To Failure	Maximize
Availability	Uptime percentage	99.9% - 99.999%

5.4 Pillar 3: Performance Efficiency

Performance Design Principles

                    Performance Pillar Principles
+------------------------------------------------------------------+
|                                                                   |
|    1. Democratize Advanced Technologies                           |
|    +----------------------------------------------------------+   |
|    |  - Use managed services                                   |   |
|    |  - Let AWS handle complexity                              |   |
|    |  - Focus on business logic                                |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    2. Go Global in Minutes                                        |
|    +----------------------------------------------------------+   |
|    |  - Deploy to multiple regions                             |   |
|    |  - Use CloudFront for global reach                        |   |
|    |  - Edge locations for low latency                         |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    3. Use Serverless Architectures                               |
|    +----------------------------------------------------------+   |
|    |  - Lambda for compute                                     |   |
|    |  - DynamoDB for database                                  |   |
|    |  - S3 for storage                                         |   |
|    |  - No server management                                    |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    4. Experiment More Often                                        |
|    +----------------------------------------------------------+   |
|    |  - Quick provisioning                                     |   |
|    |  - Test different configurations                          |   |
|    |  - A/B testing                                            |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    5. Consider Mechanical Sympathy                                |
|    +----------------------------------------------------------+   |
|    |  - Choose right instance types                            |   |
|    |  - Optimize for workload                                  |   |
|    |  - Use appropriate storage types                          |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Performance Architecture Patterns

                    Performance Optimization Layers
+------------------------------------------------------------------+
|                                                                   |
|    Layer 1: Caching                                               |
|    +----------------------------------------------------------+   |
|    |                                                          |   |
|    |  Client Cache -> CDN Cache -> App Cache -> DB Cache       |   |
|    |      |           |            |            |              |   |
|    |      v           v            v            v              |   |
|    |  Browser    CloudFront    ElastiCache   RDS/DB            |   |
|    |                                                          |   |
|    |  Benefits:                                                |   |
|    |    - Reduced latency                                      |   |
|    |    - Lower database load                                   |   |
|    |    - Better user experience                               |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Layer 2: Compute Optimization                                  |
|    +----------------------------------------------------------+   |
|    |                                                          |   |
|    |  Workload Type          Recommended Service              |   |
|    |  +----------------+-------------------+                  |   |
|    |  | Web Servers    | EC2, ALB, ASG    |                  |   |
|    |  | API Backend    | Lambda, API GW   |                  |   |
|    |  | Batch Jobs     | Batch, Lambda    |                  |   |
|    |  | Containers     | ECS, EKS, Fargate|                 |   |
|    |  | ML/AI          | SageMaker        |                  |   |
|    |  +----------------+-------------------+                  |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Layer 3: Database Optimization                                 |
|    +----------------------------------------------------------+   |
|    |                                                          |   |
|    |  Data Pattern            Recommended Database            |   |
|    |  +----------------+-------------------+                  |   |
|    |  | Relational     | RDS, Aurora      |                  |   |
|    |  | Key-Value      | DynamoDB         |                  |   |
|    |  | Document       | DocumentDB       |                  |   |
|    |  | Graph          | Neptune         |                  |   |
|    |  | Time Series    | Timestream      |                  |   |
|    |  | In-Memory      | ElastiCache     |                  |   |
|    |  +----------------+-------------------+                  |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Performance Monitoring

                    Performance Monitoring Stack
+------------------------------------------------------------------+
|                                                                   |
|                    +------------------------+                     |
|                    |   CloudWatch Dashboard  |                     |
|                    +------------------------+                     |
|                              |                                    |
|        +---------------------+---------------------+              |
|        |                     |                     |              |
|        v                     v                     v              |
|  +----------+          +----------+          +----------+         |
|  | Metrics  |          |  Logs    |          |  Traces  |         |
|  |          |          |          |          |          |         |
|  |CloudWatch|          |CloudWatch|          |  X-Ray   |         |
|  | Metrics  |          |  Logs    |          |          |         |
|  +----------+          +----------+          +----------+         |
|        |                     |                     |              |
|        v                     v                     v              |
|  +----------+          +----------+          +----------+         |
|  | Alarms   |          | Insights |          | Service  |         |
|  |          |          |          |          |   Map    |         |
|  +----------+          +----------+          +----------+         |
|                                                                   |
|    Key Metrics to Monitor:                                        |
|    +----------------------------------------------------------+   |
|    |  - CPU Utilization                                        |   |
|    |  - Memory Utilization                                     |   |
|    |  - Disk I/O                                               |   |
|    |  - Network Throughput                                     |   |
|    |  - Request Latency                                        |   |
|    |  - Error Rates                                            |   |
|    |  - Queue Depth                                            |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

5.5 Pillar 4: Cost Optimization

Cost Design Principles

                    Cost Optimization Pillar Principles
+------------------------------------------------------------------+
|                                                                   |
|    1. Implement Cloud Financial Management                        |
|    +----------------------------------------------------------+   |
|    |  - Establish cost awareness                               |   |
|    |  - Set budgets and alerts                                 |   |
|    |  - Regular cost reviews                                   |   |
|    |  - FinOps practices                                       |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    2. Adopt a Consumption Model                                   |
|    +----------------------------------------------------------+   |
|    |  - Pay for what you use                                   |   |
|    |  - Scale up and down                                      |   |
|    |  - No upfront commitments for variable workloads          |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    3. Measure Overall Efficiency                                  |
|    +----------------------------------------------------------+   |
|    |  - Track business metrics                                 |   |
|    |  - Cost per transaction                                   |   |
|    |  - Cost per customer                                      |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    4. Stop Spending Money on Undifferentiated Heavy Lifting       |
|    +----------------------------------------------------------+   |
|    |  - Use managed services                                   |   |
|    |  - Focus on competitive advantage                         |   |
|    |  - Let AWS manage infrastructure                          |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    5. Analyze and Attribute Expenditure                          |
|    +----------------------------------------------------------+   |
|    |  - Tag resources                                          |   |
|    |  - Cost allocation                                        |   |
|    |  - Chargeback/showback                                    |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Cost Optimization Strategies

                    Cost Optimization Techniques
+------------------------------------------------------------------+
|                                                                   |
|    Technique 1: Right-Sizing                                      |
|    +----------------------------------------------------------+   |
|    |                                                          |   |
|    |  Over-provisioned         Right-sized                    |   |
|    |  +----------------+       +----------------+             |   |
|    |  | m5.2xlarge    |       | m5.large       |             |   |
|    |  | CPU: 15%      |  -->  | CPU: 60%       |             |   |
|    |  | Memory: 20%   |       | Memory: 70%    |             |   |
|    |  | Cost: $280/mo |       | Cost: $70/mo   |             |   |
|    |  +----------------+       +----------------+             |   |
|    |                                                          |   |
|    |  Tools: Compute Optimizer, Cost Explorer                 |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Technique 2: Reserved Capacity                                 |
|    +----------------------------------------------------------+   |
|    |                                                          |   |
|    |  Pricing Model         Discount        Commitment        |   |
|    |  +----------------+----------------+-------------+       |   |
|    |  | On-Demand       | 0%             | None         |       |   |
|    |  | RI (1 year)     | 30-40%         | 1 year       |       |   |
|    |  | RI (3 year)     | 50-60%         | 3 years      |       |   |
|    |  | Savings Plans   | Up to 72%      | 1-3 years    |       |   |
|    |  +----------------+----------------+-------------+       |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Technique 3: Spot Instances                                   |
|    +----------------------------------------------------------+   |
|    |                                                          |   |
|    |  Use Case               Spot Discount                    |   |
|    |  +----------------+----------------+                     |   |
|    |  | Batch Jobs     | Up to 90% off  |                     |   |
|    |  | CI/CD          | Up to 90% off  |                     |   |
|    |  | Big Data       | Up to 90% off  |                     |   |
|    |  | Containerized  | Up to 90% off  |                     |   |
|    |  +----------------+----------------+                     |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Technique 4: Storage Tiering                                   |
|    +----------------------------------------------------------+   |
|    |                                                          |   |
|    |  Data Age              Storage Tier      Cost            |   |
|    |  +----------------+----------------+-------------+       |   |
|    |  | Hot (0-30 days) | S3 Standard     | $0.023/GB   |       |   |
|    |  | Warm (30-90)    | S3 Standard-IA  | $0.0125/GB  |       |   |
|    |  | Cold (90-180)   | S3 Glacier      | $0.004/GB   |       |   |
|    |  | Archive (180+)  | S3 Glacier Deep | $0.00099/GB |       |   |
|    |  +----------------+----------------+-------------+       |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

5.6 Pillar 5: Sustainability

Sustainability Design Principles

                    Sustainability Pillar Principles
+------------------------------------------------------------------+
|                                                                   |
|    1. Understand Your Impact                                       |
|    +----------------------------------------------------------+   |
|    |  - Measure sustainability metrics                         |   |
|    |  - Track carbon footprint                                 |   |
|    |  - Set improvement goals                                  |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    2. Establish Sustainability Goals                              |
|    +----------------------------------------------------------+   |
|    |  - Define targets                                         |   |
|    |  - Align with business objectives                         |   |
|    |  - Regular reviews                                        |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    3. Maximize Utilization                                         |
|    +----------------------------------------------------------+   |
|    |  - Right-size resources                                   |   |
|    |  - Use serverless                                         |   |
|    |  - Optimize workload scheduling                           |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    4. Anticipate and Adopt New Hardware                           |
|    +----------------------------------------------------------+   |
|    |  - Use latest instance generations                        |   |
|    |  - Leverage AWS efficiency improvements                   |   |
|    |  - Migrate to more efficient services                     |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    5. Use Managed Services                                         |
|    +----------------------------------------------------------+   |
|    |  - AWS manages at scale                                   |   |
|    |  - Higher efficiency                                      |   |
|    |  - Shared infrastructure                                  |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    6. Reduce Downstream Impact                                     |
|    +----------------------------------------------------------+   |
|    |  - Optimize data transfer                                |   |
|    |  - Reduce storage requirements                            |   |
|    |  - Efficient algorithms                                   |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Sustainability Best Practices

                    Sustainability Optimization
+------------------------------------------------------------------+
|                                                                   |
|    Compute Optimization                                           |
|    +----------------------------------------------------------+   |
|    |  - Use Graviton (ARM) instances - 60% more efficient     |   |
|    |  - Opt for serverless (Lambda, Fargate)                   |   |
|    |  - Use Spot instances for batch workloads                 |   |
|    |  - Implement auto-scaling                                 |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Storage Optimization                                           |
|    +----------------------------------------------------------+   |
|    |  - Use S3 Intelligent-Tiering                            |   |
|    |  - Implement lifecycle policies                           |   |
|    |  - Compress data before storage                           |   |
|    |  - Delete unused snapshots                                |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Network Optimization                                           |
|    +----------------------------------------------------------+   |
|    |  - Use CloudFront to reduce origin requests               |   |
|    |  - Implement caching                                      |   |
|    |  - Use VPC endpoints                                      |   |
|    |  - Optimize data transfer patterns                        |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Region Selection                                               |
|    +----------------------------------------------------------+   |
|    |  - Choose regions with lower carbon intensity             |   |
|    |  - Consider regions powered by renewable energy           |   |
|    |  - Balance latency with sustainability                    |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

5.7 Well-Architected Tool

Using the AWS Well-Architected Tool

                    Well-Architected Tool Workflow
+------------------------------------------------------------------+
|                                                                   |
|    Step 1: Define Workload                                        |
|    +----------------------------------------------------------+   |
|    |  - Name your workload                                     |   |
|    |  - Select region                                          |   |
|    |  - Define scope                                           |   |
|    +----------------------------------------------------------+   |
|                              |                                    |
|                              v                                    |
|    Step 2: Answer Questions                                       |
|    +----------------------------------------------------------+   |
|    |  - Answer questions for each pillar                       |   |
|    |  - Provide evidence                                       |   |
|    |  - Note risks and improvements                            |   |
|    +----------------------------------------------------------+   |
|                              |                                    |
|                              v                                    |
|    Step 3: Review Results                                         |
|    +----------------------------------------------------------+   |
|    |                                                          |   |
|    |  Pillar Scores:                                          |   |
|    |  +----------------+--------+                             |   |
|    |  | Security       | 85/100 |                             |   |
|    |  | Reliability    | 72/100 |  <-- Needs improvement     |   |
|    |  | Performance    | 90/100 |                             |   |
|    |  | Cost           | 65/100 |  <-- Needs improvement     |   |
|    |  | Sustainability | 78/100 |                             |   |
|    |  +----------------+--------+                             |   |
|    +----------------------------------------------------------+   |
|                              |                                    |
|                              v                                    |
|    Step 4: Create Improvement Plan                               |
|    +----------------------------------------------------------+   |
|    |  - Prioritize high-risk items                            |   |
|    |  - Create milestones                                      |   |
|    |  - Track progress                                         |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Sample Questions by Pillar

Pillar	Sample Question
Security	How are you protecting access to your workload?
Reliability	How does your workload handle failure?
Performance	How do you select your compute solution?
Cost	Do you have cost controls in place?
Sustainability	How do you track and measure sustainability?

5.8 Architecture Decision Records

Documenting Architecture Decisions

                    Architecture Decision Record Template
+------------------------------------------------------------------+
|                                                                   |
|    ADR-001: Use Multi-AZ RDS for Database High Availability       |
|                                                                   |
|    Status: Accepted                                               |
|                                                                   |
|    Context:                                                       |
|    +----------------------------------------------------------+   |
|    |  - Application requires 99.99% availability              |   |
|    |  - Database is critical component                        |   |
|    |  - Single AZ deployment has 99.95% availability           |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Decision:                                                      |
|    +----------------------------------------------------------+   |
|    |  - Deploy RDS in Multi-AZ configuration                  |   |
|    |  - Use synchronous replication                           |   |
|    |  - Automatic failover enabled                            |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Consequences:                                                  |
|    +----------------------------------------------------------+   |
|    |  Positive:                                               |   |
|    |    - Higher availability (99.99%)                        |   |
|    |    - Automatic failover                                   |   |
|    |    - No manual intervention                               |   |
|    |                                                          |   |
|    |  Negative:                                               |   |
|    |    - Higher cost (~2x single AZ)                          |   |
|    |    - Slight write latency increase                        |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    Alternatives Considered:                                       |
|    +----------------------------------------------------------+   |
|    |  1. Single AZ with read replicas - Lower availability     |   |
|    |  2. Self-managed database - Higher operational overhead   |   |
|    |  3. Multi-region - Higher cost, complexity               |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

5.9 Best Practices Summary

                    Well-Architected Best Practices
+------------------------------------------------------------------+
|                                                                   |
|    1. Regular Reviews                                             |
|    +----------------------------------------------------------+   |
|    |  - Conduct Well-Architected reviews quarterly            |   |
|    |  - Use AWS Well-Architected Tool                         |   |
|    |  - Document and track improvements                       |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    2. Balance Pillars                                             |
|    +----------------------------------------------------------+   |
|    |  - Trade-offs between pillars are normal                 |   |
|    |  - Document decisions                                     |   |
|    |  - Align with business requirements                       |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    3. Iterate                                                     |
|    +----------------------------------------------------------+   |
|    |  - Architecture evolves over time                         |   |
|    |  - Continuous improvement                                 |   |
|    |  - Learn from incidents                                   |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    4. Automate                                                    |
|    +----------------------------------------------------------+   |
|    |  - Infrastructure as Code                                 |   |
|    |  - Automated testing                                       |   |
|    |  - Automated deployments                                   |   |
|    +----------------------------------------------------------+   |
|                                                                   |
|    5. Measure                                                     |
|    +----------------------------------------------------------+   |
|    |  - Define metrics for each pillar                         |   |
|    |  - Set up monitoring and alerting                         |   |
|    |  - Regular reporting                                       |   |
|    +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

5.10 Exam Tips

Five Pillars: Security, Reliability, Performance, Cost, Sustainability
Trade-offs: Understand how decisions affect multiple pillars
Design Principles: Know the principles for each pillar
Well-Architected Tool: Use for architecture reviews
RPO/RTO: Know the difference and how they affect DR strategy
Right-Sizing: Key for both cost and performance optimization
Defense in Depth: Security approach with multiple layers
Serverless: Often the best choice for performance and cost

Next Chapter

Chapter 6: Amazon EC2 - Deep Dive

Last Updated: February 2026