Advanced Cost Optimization
Chapter 47: Cost Optimization & FinOps
Section titled “Chapter 47: Cost Optimization & FinOps”Managing and Optimizing AWS Costs
Section titled “Managing and Optimizing AWS Costs”47.1 Overview
Section titled “47.1 Overview”Cost optimization is a continuous process of reducing AWS spending while maintaining performance, reliability, and security. FinOps brings financial accountability to cloud spending.
Cost Optimization Overview+------------------------------------------------------------------+| || +------------------------+ || | Cost Optimization | || +------------------------+ || | || +---------------------+---------------------+ || | | | | || v v v v || +----------+ +----------+ +----------+ +----------+ || | Right | | Reserved | | Spot | | Cost | || | Sizing | | Instances| | Instances| | Monitoring| || | | | | | | | | || | - CPU | | - RI | | - 90% | | - Budgets| || | - Memory | | - Savings| | discount| | - Alerts | || | - Storage| | - Plans | | - Batch | | - Reports| || +----------+ +----------+ +----------+ +----------+ || |+------------------------------------------------------------------+Key Concepts
Section titled “Key Concepts”| Concept | Description |
|---|---|
| FinOps | Financial operations - cloud cost management framework |
| TCO | Total Cost of Ownership - all costs including hidden |
| Unit Economics | Cost per business metric (cost per transaction) |
| Showback | Show costs to teams without charging |
| Chargeback | Actually charge teams for their usage |
47.2 FinOps Framework
Section titled “47.2 FinOps Framework”FinOps Phases
Section titled “FinOps Phases” FinOps Lifecycle+------------------------------------------------------------------+| || +-----------+ || | Operate | || +-----------+ || ^ || / \ || / \ || / \ || +-----------+ +-----------+ || | See |-------->| Optimize | || +-----------+ +-----------+ || ^ | || | | || +---------------------+ || || See Phase: || +--------------------------------------------------------+ || | - Allocate costs to teams | || | - Understand cloud usage | || | - Benchmark against KPIs | || +--------------------------------------------------------+ || || Optimize Phase: || +--------------------------------------------------------+ || | - Right-size resources | || | - Use committed use discounts | || | - Eliminate waste | || +--------------------------------------------------------+ || || Operate Phase: || +--------------------------------------------------------+ || | - Implement automation | || | - Monitor and measure | || | - Continuous improvement | || +--------------------------------------------------------+ || |+------------------------------------------------------------------+Cost Allocation
Section titled “Cost Allocation” Cost Allocation Strategy+------------------------------------------------------------------+| || Tagging Strategy || +----------------------------------------------------------+ || | | || | Required Tags: | || | +----------------------------------------------------+ | || | | - Environment (dev/staging/prod) | | || | | - Owner (team or individual) | | || | | - Project (application/service) | | || | | - CostCenter (billing code) | | || | +----------------------------------------------------+ | || | | || | Optional Tags: | || | +----------------------------------------------------+ | || | | - Application | | || | | - Version | | || | | - Compliance | | || | | - Backup | | || | +----------------------------------------------------+ | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+47.3 AWS Cost Management Tools
Section titled “47.3 AWS Cost Management Tools”AWS Cost Explorer
Section titled “AWS Cost Explorer” Cost Explorer Features+------------------------------------------------------------------+| || +------------------+ +------------------+ +------------------+ || | Cost Analysis | | Forecasting | | Reports | || | | | | | | || | - By Service | | - Predict costs | | - Daily/Monthly | || | - By Region | | - Trend analysis | | - Custom | || | - By Tag | | - Budget planning| | - Scheduled | || | - By Account | | | | | || +------------------+ +------------------+ +------------------+ || || +------------------+ +------------------+ +------------------+ || | Reserved Instance| | Savings Plans | | RI Recommendations| || | Utilization | | Utilization | | | || | | | | | | || | - Coverage % | | - Coverage % | | - Right-size | || | - Utilization % | | - Utilization % | | - Purchase recs | || | - Cost savings | | - Cost savings | | - Historical | || +------------------+ +------------------+ +------------------+ || |+------------------------------------------------------------------+AWS Budgets
Section titled “AWS Budgets”# AWS Budget ConfigurationResources: # Monthly cost budget MonthlyBudget: Type: AWS::Budgets::Budget Properties: Budget: BudgetName: MonthlyCostBudget BudgetLimit: Amount: 10000 Unit: USD TimeUnit: MONTHLY BudgetType: COST CostFilters: Service: - Amazon Elastic Compute Cloud - Compute - Amazon Relational Database Service CostTypes: IncludeTax: true IncludeSupport: true IncludeDiscount: true IncludeRefund: true IncludeCredit: true IncludeUpfront: true IncludeRecurring: true IncludeOtherSubscription: true IncludeSubscription: true CalculatedSpend: ActualSpend: Amount: 5000 Unit: USD EstimatedSpend: Amount: 9500 Unit: USD NotificationsWithSubscribers: - Notification: NotificationType: ACTUAL ComparisonOperator: GREATER_THAN Threshold: 80 ThresholdType: PERCENTAGE Subscribers: - Address: ops@example.com Type: EMAIL - Notification: NotificationType: ACTUAL ComparisonOperator: GREATER_THAN Threshold: 100 ThresholdType: PERCENTAGE Subscribers: - Address: ops@example.com Type: EMAIL - Address: https://hooks.slack.com/services/xxx Type: SNS
# RI utilization budget RIUtilizationBudget: Type: AWS::Budgets::Budget Properties: Budget: BudgetName: RIUtilizationBudget BudgetLimit: Amount: 80 Unit: PERCENTAGE TimeUnit: MONTHLY BudgetType: RI_UTILIZATION NotificationsWithSubscribers: - Notification: NotificationType: ACTUAL ComparisonOperator: LESS_THAN Threshold: 80 ThresholdType: PERCENTAGE Subscribers: - Address: ops@example.com Type: EMAILCost Anomaly Detection
Section titled “Cost Anomaly Detection”# Cost Anomaly DetectionResources: AnomalyMonitor: Type: AWS::CE::AnomalyMonitor Properties: MonitorName: ServiceCostMonitor MonitorType: DIMENSIONAL MonitorDimension: SERVICE MonitorSpecification: Tags: - Key: Environment Value: production
AnomalySubscription: Type: AWS::CE::AnomalySubscription Properties: SubscriptionName: CostAnomalyAlerts Threshold: 100 # Alert on anomalies > $100 Frequency: DAILY Subscribers: - Address: ops@example.com Type: EMAIL MonitorArnList: - !GetAtt AnomalyMonitor.MonitorArn47.4 Right-Sizing
Section titled “47.4 Right-Sizing”EC2 Right-Sizing
Section titled “EC2 Right-Sizing” EC2 Right-Sizing Analysis+------------------------------------------------------------------+| || Metrics to Monitor || +----------------------------------------------------------+ || | | || | CPU Utilization | || | +----------------------------------------------------+ | || | | - Average < 40%: Consider downsizing | | || | | - Average > 80%: Consider upsizing | | || | | - Spikes > 90%: May need larger instance | | || | +----------------------------------------------------+ | || | | || | Memory Utilization | || | +----------------------------------------------------+ | || | | - Average < 50%: Consider downsizing | | || | | - Average > 85%: Consider upsizing | | || | | - Requires CloudWatch agent | | || | +----------------------------------------------------+ | || | | || | Network Utilization | || | +----------------------------------------------------+ | || | | - Low throughput: Consider smaller instance | | || | | - High throughput: Consider larger instance | | || | | - Burst vs. Enhanced networking | | || | +----------------------------------------------------+ | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Right-Sizing Recommendations
Section titled “Right-Sizing Recommendations”# Get right-sizing recommendations using AWS CLIaws ce get-rightsizing-recommendation \ --service-configuration '{ "ServiceCode": "AmazonEC2", "UsageUnit": "Hrs" }' \ --filter '{ "Dimensions": { "Key": "SERVICE", "Values": ["Amazon Elastic Compute Cloud - Compute"] } }'
# Output example:{ "RightsizingRecommendations": [ { "CurrentInstance": { "InstanceId": "i-1234567890abcdef0", "InstanceType": "m5.xlarge", "Region": "us-east-1" }, "RightsizingType": "MODIFY", "ModifyRecommendation": { "TargetInstance": { "InstanceType": "m5.large", "EstimatedMonthlySavings": 35.00 } } } ]}Instance Type Selection
Section titled “Instance Type Selection” Instance Type Selection Guide+------------------------------------------------------------------+| || General Purpose || +------------------+ +------------------+ +------------------+ || | M5/M6g Series | | T3/T4g Series | | A1 Series | || | | | | | | || | - Balanced | | - Burstable | | - ARM-based | || | - Production | | - Dev/Test | | - Cost-effective | || | - General workloads| | - Variable load | | - ARM workloads | || +------------------+ +------------------+ +------------------+ || || Compute Optimized || +------------------+ +------------------+ || | C5/C6g Series | | HPC Instances | || | | | | || | - High CPU | | - Batch | || | - Gaming | | - Scientific | || | - HPC | | - ML Training | || +------------------+ +------------------+ || || Memory Optimized || +------------------+ +------------------+ +------------------+ || | R5/R6g Series | | X1/X2 Series | | Z1D Series | || | | | | | | || | - Databases | | - In-memory DB | | - High memory | || | - Big Data | | - SAP HANA | | - High CPU | || | - Analytics | | - Large datasets | | - Enterprise | || +------------------+ +------------------+ +------------------+ || |+------------------------------------------------------------------+47.5 Reserved Instances & Savings Plans
Section titled “47.5 Reserved Instances & Savings Plans”Reserved Instances
Section titled “Reserved Instances” Reserved Instance Types+------------------------------------------------------------------+| || Standard Reserved Instances || +----------------------------------------------------------+ || | | || | Term: 1 year or 3 years | || | Payment: All upfront, Partial upfront, No upfront | || | Discount: Up to 40% (1 year), 60% (3 years) | || | Flexibility: Can change AZ, size within family | || | | || +----------------------------------------------------------+ || || Convertible Reserved Instances || +----------------------------------------------------------+ || | | || | Term: 1 year or 3 years | || | Payment: All upfront, Partial upfront, No upfront | || | Discount: Up to 30% (1 year), 45% (3 years) | || | Flexibility: Can change family, OS, tenancy | || | | || +----------------------------------------------------------+ || || Scheduled Reserved Instances || +----------------------------------------------------------+ || | | || | Term: 1 year | || | Schedule: Recurring daily/weekly schedule | || | Use case: Predictable recurring workloads | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Savings Plans
Section titled “Savings Plans” Savings Plans Types+------------------------------------------------------------------+| || Compute Savings Plan || +----------------------------------------------------------+ || | | || | Commitment: $/hour for 1 or 3 years | || | Discount: Up to 66% | || | Applies to: | || | - EC2 instances (any family, size, region, OS) | || | - Fargate | || | - Lambda | || | Flexibility: Highest | || | | || +----------------------------------------------------------+ || || EC2 Instance Savings Plan || +----------------------------------------------------------+ || | | || | Commitment: $/hour for 1 or 3 years | || | Discount: Up to 72% | || | Applies to: | || | - EC2 instances within family in a region | || | Flexibility: Size, OS, tenancy within family | || | | || +----------------------------------------------------------+ || || SageMaker Savings Plan || +----------------------------------------------------------+ || | | || | Commitment: $/hour for 1 or 3 years | || | Discount: Up to 64% | || | Applies to: SageMaker ML instances | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Savings Plan Configuration
Section titled “Savings Plan Configuration”# Savings Plan PurchaseResources: ComputeSavingsPlan: Type: AWS::SavingsPlans::SavingsPlan Properties: SavingsPlanType: COMPUTE Commitment: 10.00 # $10/hour commitment Term: THREE_YEAR PaymentOption: NO_UPFRONT Tags: - Key: Environment Value: production - Key: Owner Value: platform-team47.6 Spot Instances
Section titled “47.6 Spot Instances”Spot Instance Strategy
Section titled “Spot Instance Strategy” Spot Instance Strategy+------------------------------------------------------------------+| || Use Cases || +----------------------------------------------------------+ || | | || | Ideal for: | || | - Batch processing | || | - CI/CD pipelines | || | - Big data analytics | || | - Containerized workloads | || | - Stateless applications | || | - Image/video processing | || | | || | Not recommended for: | || | - Databases | || | - Stateful applications | || | - Long-running jobs without checkpointing | || | | || +----------------------------------------------------------+ || || Spot Best Practices || +----------------------------------------------------------+ || | | || | 1. Use multiple instance types | || | 2. Use multiple Availability Zones | || | 3. Implement graceful shutdown | || | 4. Use Spot interruption notices | || | 5. Combine with On-Demand for critical capacity | || | | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+Spot Fleet Configuration
Section titled “Spot Fleet Configuration”# Spot Fleet ConfigurationResources: SpotFleetRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: spotfleet.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonEC2SpotFleetTaggingRole
SpotFleet: Type: AWS::EC2::SpotFleet Properties: SpotFleetRequestConfigData: IamFleetRole: !GetAtt SpotFleetRole.Arn AllocationStrategy: capacityOptimized TargetCapacity: 10 OnDemandTargetCapacity: 2 # 20% On-Demand InstanceInterruptionBehavior: terminate LaunchTemplateConfigs: - LaunchTemplateSpecification: LaunchTemplateId: !Ref LaunchTemplate Version: !GetAtt LaunchTemplate.LatestVersionNumber Overrides: - InstanceType: m5.large SubnetId: subnet-az-a - InstanceType: m5.xlarge SubnetId: subnet-az-a - InstanceType: m5.large SubnetId: subnet-az-b - InstanceType: m5.xlarge SubnetId: subnet-az-b - InstanceType: c5.large SubnetId: subnet-az-a - InstanceType: c5.large SubnetId: subnet-az-bSpot Instance Interruption Handling
Section titled “Spot Instance Interruption Handling”import boto3import jsonimport logging
logger = logging.getLogger()logger.setLevel(logging.INFO)
def lambda_handler(event, context): """ Handle Spot instance interruption notices """
ec2 = boto3.client('ec2') asg = boto3.client('autoscaling')
# Parse the interruption notice detail = event.get('detail', {}) instance_id = detail.get('instance-id') action = detail.get('instance-action')
if action == 'terminate': logger.info(f"Spot interruption notice for {instance_id}")
# Get instance details instance = ec2.describe_instances(InstanceIds=[instance_id]) tags = instance['Reservations'][0]['Instances'][0].get('Tags', [])
# Find ASG from tags asg_name = None for tag in tags: if tag['Key'] == 'aws:autoscaling:groupName': asg_name = tag['Value'] break
if asg_name: # Detach instance from ASG with decrement asg.detach_instances( AutoScalingGroupName=asg_name, InstanceIds=[instance_id], ShouldDecrementDesiredCapacity=False ) logger.info(f"Detached {instance_id} from {asg_name}")
# Graceful shutdown tasks # - Save state to S3/DynamoDB # - Complete in-progress work # - Notify other services
return { 'statusCode': 200, 'body': json.dumps({ 'message': 'Spot interruption handled', 'instance_id': instance_id }) }47.7 Storage Cost Optimization
Section titled “47.7 Storage Cost Optimization”S3 Cost Optimization
Section titled “S3 Cost Optimization” S3 Storage Classes+------------------------------------------------------------------+| || Storage Class | Use Case | Cost || -----------------------+----------------------+----------------- || S3 Standard | Frequently accessed | $$$$ || S3 Intelligent-Tiering| Unknown patterns | $$$ || S3 Standard-IA | Infrequent access | $$ || S3 One Zone-IA | Infrequent, non-critical| $ || S3 Glacier Instant | Archive, instant access| $ || S3 Glacier Flexible | Archive, hours access | $ || S3 Glacier Deep Archive| Long-term archive | $ || |+------------------------------------------------------------------+S3 Lifecycle Policies
Section titled “S3 Lifecycle Policies”# S3 Lifecycle ConfigurationResources: DataBucket: Type: AWS::S3::Bucket Properties: BucketName: data-bucket VersioningConfiguration: Status: Enabled LifecycleConfiguration: Rules: # Transition to IA after 30 days - Id: TransitionToIA Status: Enabled Filter: Prefix: logs/ Transitions: - TransitionInDays: 30 StorageClass: STANDARD_IA - TransitionInDays: 90 StorageClass: GLACIER ExpirationInDays: 365
# Intelligent Tiering for unknown patterns - Id: IntelligentTiering Status: Enabled Filter: Prefix: uploads/ Transitions: - StorageClass: INTELLIGENT_TIERING
# Non-current version expiration - Id: NonCurrentVersionExpiration Status: Enabled NoncurrentVersionExpiration: NoncurrentDays: 30 NewerNoncurrentVersions: 5
# Delete incomplete multipart uploads - Id: MultipartUploadCleanup Status: Enabled AbortIncompleteMultipartUpload: DaysAfterInitiation: 7EBS Cost Optimization
Section titled “EBS Cost Optimization”# EBS Volume OptimizationResources: OptimizedVolume: Type: AWS::EC2::Volume Properties: AvailabilityZone: us-east-1a Size: 100 VolumeType: gp3 # Most cost-effective general purpose Iops: 3000 # Baseline included Throughput: 125 # MB/s baseline included Encrypted: true KmsKeyId: !Ref EBSKMSKey Tags: - Key: Name Value: optimized-volume
# Snapshot lifecycle SnapshotPolicy: Type: AWS::DLM::LifecyclePolicy Properties: Description: Daily snapshot policy State: ENABLED ExecutionRoleArn: !Ref DLMPolicyRole PolicyDetails: PolicyType: EBS_SNAPSHOT_MANAGEMENT ResourceTypes: - VOLUME TargetTags: - Key: Backup Value: daily Schedules: - Name: DailySnapshots CreateRule: Interval: 24 IntervalUnit: HOURS Times: - "05:00" RetainRule: Count: 7 CopyTags: true TagsToAdd: - Key: SnapshotType Value: automated47.8 Data Transfer Optimization
Section titled “47.8 Data Transfer Optimization”Data Transfer Costs
Section titled “Data Transfer Costs” Data Transfer Costs+------------------------------------------------------------------+| || Inbound Data Transfer || +----------------------------------------------------------+ || | - Free: All data transfer into AWS | || +----------------------------------------------------------+ || || Outbound Data Transfer || +----------------------------------------------------------+ || | - First 100 GB/month: Free | || | - Up to 10 TB/month: $0.09/GB | || | - Next 40 TB/month: $0.085/GB | || | - Next 100 TB/month: $0.07/GB | || | - Over 150 TB/month: Contact AWS | || +----------------------------------------------------------+ || || Inter-Region Data Transfer || +----------------------------------------------------------+ || | - Between regions: $0.02-$0.14/GB | || | - Same region: Free | || +----------------------------------------------------------+ || || Cost Optimization Strategies || +----------------------------------------------------------+ || | - Use CloudFront for content delivery | || | - Use VPC endpoints for AWS services | || | - Compress data before transfer | || | - Use Direct Connect for high volume | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+CloudFront for Cost Optimization
Section titled “CloudFront for Cost Optimization”# CloudFront DistributionResources: CloudFrontDistribution: Type: AWS::CloudFront::Distribution Properties: DistributionConfig: Enabled: true PriceClass: PriceClass_100 # Use edge locations in US/EU only Origins: - DomainName: !GetAtt OriginBucket.RegionalDomainName Id: S3Origin S3OriginConfig: OriginAccessIdentity: !Ref CloudFrontOAI DefaultCacheBehavior: TargetOriginId: S3Origin ViewerProtocolPolicy: redirect-to-https CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639d58f6 # CachingOptimized Compress: true # Enable compression CacheBehaviors: - PathPattern: /static/* TargetOriginId: S3Origin ViewerProtocolPolicy: redirect-to-https CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639d58f6 Compress: true DefaultTTL: 86400 # 1 day MaxTTL: 31536000 # 1 year MinTTL: 047.9 Cost Governance
Section titled “47.9 Cost Governance”Tagging Policy
Section titled “Tagging Policy”# AWS Tag PolicyResources: TagPolicy: Type: AWS::Organizations::Policy Properties: Name: RequiredTagsPolicy Description: Enforce required tags Type: TAG_POLICY Content: | { "tags": { "Environment": { "tag_key": { "@@assign": "Environment" }, "tag_value": { "@@assign": ["dev", "staging", "prod"] }, "enforced_for": { "@@assign": [ "ec2:instance", "ec2:volume", "s3:bucket", "rds:db" ] } }, "Owner": { "tag_key": { "@@assign": "Owner" }, "tag_value": { "@@assign": "*" }, "enforced_for": { "@@assign": [ "ec2:instance", "s3:bucket" ] } }, "CostCenter": { "tag_key": { "@@assign": "CostCenter" }, "tag_value": { "@@assign": "*" } } } }Service Control Policies for Cost
Section titled “Service Control Policies for Cost”{ "Version": "2012-10-17", "Statement": [ { "Sid": "DenyExpensiveInstanceTypes", "Effect": "Deny", "Action": "ec2:RunInstances", "Resource": "arn:aws:ec2:*:*:instance/*", "Condition": { "ForAnyValue:StringLike": { "ec2:InstanceType": [ "*.8xlarge", "*.12xlarge", "*.16xlarge", "*.24xlarge", "*.metal" ] } } }, { "Sid": "DenyUntaggedResources", "Effect": "Deny", "Action": [ "ec2:RunInstances", "s3:CreateBucket", "rds:CreateDBInstance" ], "Resource": [ "arn:aws:ec2:*:*:instance/*", "arn:aws:s3:::*", "arn:aws:rds:*:*:db:*" ], "Condition": { "StringNotEquals": { "aws:RequestTag/Environment": "?*" } } } ]}47.10 Cost Monitoring Automation
Section titled “47.10 Cost Monitoring Automation”Automated Cost Reporting
Section titled “Automated Cost Reporting”import boto3import jsonimport loggingfrom datetime import datetime, timedeltafrom dateutil.relativedelta import relativedelta
logger = logging.getLogger()logger.setLevel(logging.INFO)
def lambda_handler(event, context): """ Generate and send cost reports """
ce = boto3.client('ce') sns = boto3.client('sns')
# Get date range end_date = datetime.now() start_date = end_date - timedelta(days=7)
# Get cost and usage response = ce.get_cost_and_usage( TimePeriod={ 'Start': start_date.strftime('%Y-%m-%d'), 'End': end_date.strftime('%Y-%m-%d') }, Granularity='DAILY', Metrics=['UnblendedCost'], GroupBy=[ {'Type': 'DIMENSION', 'Key': 'SERVICE'}, {'Type': 'DIMENSION', 'Key': 'LINKED_ACCOUNT'} ] )
# Process results total_cost = 0 service_costs = {}
for result in response['ResultsByTime']: for group in result['Groups']: service = group['Keys'][0] account = group['Keys'][1] cost = float(group['Metrics']['UnblendedCost']['Amount'])
total_cost += cost
if service not in service_costs: service_costs[service] = 0 service_costs[service] += cost
# Sort by cost sorted_services = sorted( service_costs.items(), key=lambda x: x[1], reverse=True )
# Build report report = f""" Weekly AWS Cost Report ====================== Period: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')} Total Cost: ${total_cost:.2f}
Top 10 Services by Cost: """
for service, cost in sorted_services[:10]: report += f"\n - {service}: ${cost:.2f}"
# Get forecast forecast = ce.get_cost_forecast( TimePeriod={ 'Start': end_date.strftime('%Y-%m-%d'), 'End': (end_date + relativedelta(months=1)).strftime('%Y-%m-%d') }, Metric='UNBLENDED_COST', Granularity='MONTHLY' )
forecast_amount = float( forecast['ForecastResultsByTime'][0]['MeanValue'] )
report += f"\n\n Monthly Forecast: ${forecast_amount:.2f}"
# Send notification sns.publish( TopicArn='arn:aws:sns:us-east-1:123456789012:cost-reports', Subject='Weekly AWS Cost Report', Message=report )
return { 'statusCode': 200, 'body': json.dumps({ 'total_cost': total_cost, 'forecast': forecast_amount }) }Automated Resource Cleanup
Section titled “Automated Resource Cleanup”import boto3import jsonimport loggingfrom datetime import datetime, timedelta
logger = logging.getLogger()logger.setLevel(logging.INFO)
def lambda_handler(event, context): """ Clean up unused resources """
ec2 = boto3.client('ec2') rds = boto3.client('rds') s3 = boto3.client('s3')
results = { 'volumes_deleted': [], 'snapshots_deleted': [], 'old_snapshots': [] }
# 1. Delete unattached EBS volumes volumes = ec2.describe_volumes( Filters=[ {'Name': 'status', 'Values': ['available']} ] )
for volume in volumes['Volumes']: # Check if volume is old enough (7 days) create_time = volume['CreateTime'].replace(tzinfo=None) if datetime.now() - create_time > timedelta(days=7): # Check for tags that prevent deletion tags = {t['Key']: t['Value'] for t in volume.get('Tags', [])} if tags.get('KeepAlive', 'false').lower() != 'true': ec2.delete_volume(VolumeId=volume['VolumeId']) results['volumes_deleted'].append(volume['VolumeId']) logger.info(f"Deleted unattached volume: {volume['VolumeId']}")
# 2. Delete old snapshots (older than 90 days) snapshots = ec2.describe_snapshots(OwnerIds=['self'])
for snapshot in snapshots['Snapshots']: start_time = snapshot['StartTime'].replace(tzinfo=None) if datetime.now() - start_time > timedelta(days=90): # Check for tags tags = {t['Key']: t['Value'] for t in snapshot.get('Tags', [])} if tags.get('KeepForever', 'false').lower() != 'true': ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId']) results['snapshots_deleted'].append(snapshot['SnapshotId']) logger.info(f"Deleted old snapshot: {snapshot['SnapshotId']}")
# 3. Find RDS instances without recent connections # (This would require CloudWatch metrics analysis)
return { 'statusCode': 200, 'body': json.dumps(results) }47.12 Why This Matters in DevOps/SRE
Section titled “47.12 Why This Matters in DevOps/SRE”Cost optimization is essential for operational efficiency and budget management. SREs balance reliability with cost-effectiveness.
Cost Optimization in DevOps/SRE+------------------------------------------------------------------+| || SRE Financial Responsibility: || || 1. Right-Sizing = Reliability + Cost Savings || +----------------------------------------------------------+ || | - Overprovisioned resources waste money | || | - Right-sized resources meet SLOs efficiently | || | - Monitor utilization and adjust | || +----------------------------------------------------------+ || || 2. Spot for Fault Tolerance || +----------------------------------------------------------+ || | - Spot instances for stateless workloads | || | - Auto Scaling handles interruptions | || | - Significant savings (60-90%) | || +----------------------------------------------------------+ || || 3. Cost as a Reliability Metric || +----------------------------------------------------------+ || | - Include cost in SLO decisions | || | - Right instance types for right workload | || | - Automate idle resource termination | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+47.13 Linux Systems Perspective
Section titled “47.13 Linux Systems Perspective”Cost Monitoring CLI
Section titled “Cost Monitoring CLI”# Get cost by serviceaws ce get-cost-and-usage \ --time-period Start=2024-01-01,End=2024-02-01 \ --granularity MONTHLY \ --metrics UnblendedCost \ --group-by Type=DIMENSION,Key=SERVICE
# Check instance utilizationaws ce get-rightsized-recommendations \ --service="Amazon EC2"47.14 Common Mistakes & Anti-Patterns
Section titled “47.14 Common Mistakes & Anti-Patterns” Cost Optimization Anti-Patterns+------------------------------------------------------------------+| || ❌ Mistake 1: Not Using Right-Sizing || +----------------------------------------------------------+ || | Problem: Running oversized instances | || | Impact: 40-60% wasted spend | || | Fix: Use Cost Explorer rightsizing recommendations | || +----------------------------------------------------------+ || || ❌ Mistake 2: Buying RI Without Analysis || +----------------------------------------------------------+ || | Problem: Reserved Instances for variable workloads | || | Impact: Wasted commitment charges | || | Fix: Analyze usage patterns before commitment | || +----------------------------------------------------------+ || || ❌ Mistake 3: Ignoring Idle Resources || +----------------------------------------------------------+ || | Problem: Dev/test environments running 24/7 | || | Impact: 65%+ wasted on non-production | || | Fix: Scheduled start/stop with Lambda | || +----------------------------------------------------------+ || || ❌ Mistake 4: Not Using Spot for Fault-Tolerant Workloads || +----------------------------------------------------------+ || | Problem: Paying full price for batch jobs | || | Impact: Missed savings opportunity | || | Fix: Use Spot Fleet or ASG with Spot | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+47.15 Interview Questions
Section titled “47.15 Interview Questions”Conceptual Questions
Section titled “Conceptual Questions”-
Q: When should you use Savings Plans vs Reserved Instances?
- A: Savings Plans offer flexibility (compute savings plan applies to any usage). RIs offer capacity reservation. Use Compute Savings Plans for flexible usage, RIs when you need guaranteed capacity.
-
Q: How do you handle cost allocation in a large organization?
- A: Use AWS Organizations with linked accounts. Enable Cost Explorer. Use tag-based allocation. Set up budgets and alerts. Create chargeback reports.
Scenario-Based Questions
Section titled “Scenario-Based Questions”- Q: Your monthly bill increased 40%. How would you investigate?
- A: Use Cost Explorer to identify the spike. Check new resources. Review CloudTrail for unauthorized usage. Analyze rightsizing recommendations. Check for abandoned resources.
47.16 Key Takeaways
Section titled “47.16 Key Takeaways”Cost Optimization Checklist
Section titled “Cost Optimization Checklist”# Cost Optimization Checklist
## Compute- [ ] Right-size EC2 instances based on utilization- [ ] Use Reserved Instances or Savings Plans for steady workloads- [ ] Use Spot Instances for flexible workloads- [ ] Implement Auto Scaling- [ ] Schedule non-production instances to stop after hours
## Storage- [ ] Use S3 lifecycle policies- [ ] Use appropriate storage classes- [ ] Delete unattached EBS volumes- [ ] Use EBS gp3 for better price/performance- [ ] Implement snapshot lifecycle policies
## Network- [ ] Use CloudFront for content delivery- [ ] Use VPC endpoints for AWS services- [ ] Minimize inter-region data transfer- [ ] Compress data before transfer
## Database- [ ] Right-size RDS instances- [ ] Use Aurora Serverless for variable workloads- [ ] Implement read replicas for read-heavy workloads- [ ] Use ElastiCache for caching
## Governance- [ ] Implement tagging strategy- [ ] Set up AWS Budgets- [ ] Enable Cost Anomaly Detection- [ ] Regular cost reviews- [ ] Implement showback/chargeback47.17 Key Takeaways
Section titled “47.17 Key Takeaways”| Topic | Key Points |
|---|---|
| FinOps | Implement continuous cost management cycle |
| Right-Sizing | Monitor utilization and adjust resources |
| Commitment | Use RIs and Savings Plans for steady workloads |
| Spot | Use Spot for flexible, fault-tolerant workloads |
| Storage | Use lifecycle policies and appropriate storage classes |
| Governance | Implement tagging, budgets, and policies |
47.18 References
Section titled “47.18 References”47.19 Exam Tips
Section titled “47.19 Exam Tips” Key Exam Points+------------------------------------------------------------------+| || 1. Cost Optimization pillars: Right-size, reserved capacity, || spot instances, storage lifecycle || || 2. AWS Cost Explorer: Visualize and analyze spending || || 3. Reserved Instances: For steady-state workloads || || 4. Savings Plans: Flexible RI alternative || || 5. Spot Instances: For fault-tolerant, flexible workloads || || 6. S3 Intelligent Tiering: Auto-optimize storage costs || || 7. AWS Budgets: Set alerts for cost thresholds || || 8. Cost Allocation Tags: Track resource costs || || 9. AWS Compute Optimizer: Right-size recommendations || || 10. FinOps: Culture of cost awareness across teams || |+------------------------------------------------------------------+47.20 Summary
Section titled “47.20 Summary” Chapter 47 Summary+------------------------------------------------------------------+| || Cost Optimization & FinOps || +------------------------------------------------------------+ || | - Right-sizing: Match resources to actual needs | || | - Reserved Capacity: Save on steady workloads | || | - Spot Instances: Optimize flexible workloads | || | - Storage Lifecycle: Move data to cheaper tiers | || +------------------------------------------------------------+ || || Key Strategies || +------------------------------------------------------------+ || | - Right-size: Use CloudWatch, Compute Optimizer | || | - Reserved Instances: 1-3 year commitments | || | - Spot: Interruptible workloads up to 90% savings | || | - S3 Lifecycle: Standard → IA → Glacier | || +------------------------------------------------------------+ || || Best Practices || +------------------------------------------------------------+ || | - Enable Cost Explorer | || | - Set budgets and alerts | || | - Implement tagging strategy | || | - Use Cost Allocation Tags | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+Next Chapter: Chapter 48 - Multi-Region & Multi-Account Strategies