Cost_optimization

Chapter 47: Cost Optimization & FinOps

Managing and Optimizing AWS Costs

47.1 Overview

Cost optimization is a continuous process of reducing AWS spending while maintaining performance, reliability, and security. FinOps brings financial accountability to cloud spending.

                    Cost Optimization Overview
+------------------------------------------------------------------+
|                                                                   |
|                    +------------------------+                     |
|                    |    Cost Optimization   |                     |
|                    +------------------------+                     |
|                              |                                    |
|        +---------------------+---------------------+              |
|        |            |            |            |                  |
|        v            v            v            v                  |
|  +----------+ +----------+ +----------+ +----------+            |
|  | Right    | | Reserved | | Spot     | | Cost     |            |
|  | Sizing   | | Instances| | Instances| | Monitoring|            |
|  |          | |          | |          | |          |            |
|  | - CPU    | | - RI     | | - 90%    | | - Budgets|            |
|  | - Memory | | - Savings| |   discount| | - Alerts |            |
|  | - Storage| | - Plans  | | - Batch  | | - Reports|            |
|  +----------+ +----------+ +----------+ +----------+            |
|                                                                   |
+------------------------------------------------------------------+

Key Concepts

Concept	Description
FinOps	Financial operations - cloud cost management framework
TCO	Total Cost of Ownership - all costs including hidden
Unit Economics	Cost per business metric (cost per transaction)
Showback	Show costs to teams without charging
Chargeback	Actually charge teams for their usage

47.2 FinOps Framework

FinOps Phases

                    FinOps Lifecycle
+------------------------------------------------------------------+
|                                                                   |
|                        +-----------+                             |
|                        |  Operate  |                             |
|                        +-----------+                             |
|                             ^                                    |
|                            / \                                   |
|                           /   \                                  |
|                          /     \                                 |
|            +-----------+         +-----------+                   |
|            |   See     |-------->|  Optimize |                   |
|            +-----------+         +-----------+                   |
|                 ^                     |                         |
|                 |                     |                         |
|                 +---------------------+                         |
|                                                                   |
|  See Phase:                                                      |
|  +--------------------------------------------------------+      |
|  | - Allocate costs to teams                               |      |
|  | - Understand cloud usage                                |      |
|  | - Benchmark against KPIs                                |      |
|  +--------------------------------------------------------+      |
|                                                                   |
|  Optimize Phase:                                                 |
|  +--------------------------------------------------------+      |
|  | - Right-size resources                                  |      |
|  | - Use committed use discounts                           |      |
|  | - Eliminate waste                                       |      |
|  +--------------------------------------------------------+      |
|                                                                   |
|  Operate Phase:                                                  |
|  +--------------------------------------------------------+      |
|  | - Implement automation                                  |      |
|  | - Monitor and measure                                   |      |
|  | - Continuous improvement                                |      |
|  +--------------------------------------------------------+      |
|                                                                   |
+------------------------------------------------------------------+

Cost Allocation

                    Cost Allocation Strategy
+------------------------------------------------------------------+
|                                                                   |
|  Tagging Strategy                                                |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  Required Tags:                                           |   |
|  |  +----------------------------------------------------+  |   |
|  |  | - Environment (dev/staging/prod)                    |  |   |
|  |  | - Owner (team or individual)                        |  |   |
|  |  | - Project (application/service)                     |  |   |
|  |  | - CostCenter (billing code)                         |  |   |
|  |  +----------------------------------------------------+  |   |
|  |                                                           |   |
|  |  Optional Tags:                                           |   |
|  |  +----------------------------------------------------+  |   |
|  |  | - Application                                      |  |   |
|  |  | - Version                                          |  |   |
|  |  | - Compliance                                       |  |   |
|  |  | - Backup                                           |  |   |
|  |  +----------------------------------------------------+  |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

47.3 AWS Cost Management Tools

AWS Cost Explorer

                    Cost Explorer Features
+------------------------------------------------------------------+
|                                                                   |
|  +------------------+  +------------------+  +------------------+ |
|  | Cost Analysis    |  | Forecasting      |  | Reports          | |
|  |                  |  |                  |  |                  | |
|  | - By Service     |  | - Predict costs  |  | - Daily/Monthly  | |
|  | - By Region      |  | - Trend analysis |  | - Custom        | |
|  | - By Tag         |  | - Budget planning|  | - Scheduled     | |
|  | - By Account     |  |                  |  |                  | |
|  +------------------+  +------------------+  +------------------+ |
|                                                                   |
|  +------------------+  +------------------+  +------------------+ |
|  | Reserved Instance|  | Savings Plans    |  | RI Recommendations| |
|  | Utilization      |  | Utilization      |  |                  | |
|  |                  |  |                  |  |                  | |
|  | - Coverage %     |  | - Coverage %     |  | - Right-size     | |
|  | - Utilization %  |  | - Utilization %  |  | - Purchase recs  | |
|  | - Cost savings   |  | - Cost savings   |  | - Historical     | |
|  +------------------+  +------------------+  +------------------+ |
|                                                                   |
+------------------------------------------------------------------+

AWS Budgets

# AWS Budget Configuration
Resources:
  # Monthly cost budget
  MonthlyBudget:
    Type: AWS::Budgets::Budget
    Properties:
      Budget:
        BudgetName: MonthlyCostBudget
        BudgetLimit:
          Amount: 10000
          Unit: USD
        TimeUnit: MONTHLY
        BudgetType: COST
        CostFilters:
          Service:
            - Amazon Elastic Compute Cloud - Compute
            - Amazon Relational Database Service
        CostTypes:
          IncludeTax: true
          IncludeSupport: true
          IncludeDiscount: true
          IncludeRefund: true
          IncludeCredit: true
          IncludeUpfront: true
          IncludeRecurring: true
          IncludeOtherSubscription: true
          IncludeSubscription: true
        CalculatedSpend:
          ActualSpend:
            Amount: 5000
            Unit: USD
          EstimatedSpend:
            Amount: 9500
            Unit: USD
      NotificationsWithSubscribers:
        - Notification:
            NotificationType: ACTUAL
            ComparisonOperator: GREATER_THAN
            Threshold: 80
            ThresholdType: PERCENTAGE
          Subscribers:
            - Address: ops@example.com
              Type: EMAIL
        - Notification:
            NotificationType: ACTUAL
            ComparisonOperator: GREATER_THAN
            Threshold: 100
            ThresholdType: PERCENTAGE
          Subscribers:
            - Address: ops@example.com
              Type: EMAIL
            - Address: https://hooks.slack.com/services/xxx
              Type: SNS

  # RI utilization budget
  RIUtilizationBudget:
    Type: AWS::Budgets::Budget
    Properties:
      Budget:
        BudgetName: RIUtilizationBudget
        BudgetLimit:
          Amount: 80
          Unit: PERCENTAGE
        TimeUnit: MONTHLY
        BudgetType: RI_UTILIZATION
      NotificationsWithSubscribers:
        - Notification:
            NotificationType: ACTUAL
            ComparisonOperator: LESS_THAN
            Threshold: 80
            ThresholdType: PERCENTAGE
          Subscribers:
            - Address: ops@example.com
              Type: EMAIL

Cost Anomaly Detection

# Cost Anomaly Detection
Resources:
  AnomalyMonitor:
    Type: AWS::CE::AnomalyMonitor
    Properties:
      MonitorName: ServiceCostMonitor
      MonitorType: DIMENSIONAL
      MonitorDimension: SERVICE
      MonitorSpecification:
        Tags:
          - Key: Environment
            Value: production

  AnomalySubscription:
    Type: AWS::CE::AnomalySubscription
    Properties:
      SubscriptionName: CostAnomalyAlerts
      Threshold: 100  # Alert on anomalies > $100
      Frequency: DAILY
      Subscribers:
        - Address: ops@example.com
          Type: EMAIL
      MonitorArnList:
        - !GetAtt AnomalyMonitor.MonitorArn

47.4 Right-Sizing

EC2 Right-Sizing

                    EC2 Right-Sizing Analysis
+------------------------------------------------------------------+
|                                                                   |
|  Metrics to Monitor                                              |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  CPU Utilization                                          |   |
|  |  +----------------------------------------------------+  |   |
|  |  | - Average < 40%: Consider downsizing               |  |   |
|  |  | - Average > 80%: Consider upsizing                  |  |   |
|  |  | - Spikes > 90%: May need larger instance            |  |   |
|  |  +----------------------------------------------------+  |   |
|  |                                                           |   |
|  |  Memory Utilization                                       |   |
|  |  +----------------------------------------------------+  |   |
|  |  | - Average < 50%: Consider downsizing               |  |   |
|  |  | - Average > 85%: Consider upsizing                  |  |   |
|  |  | - Requires CloudWatch agent                          |  |   |
|  |  +----------------------------------------------------+  |   |
|  |                                                           |   |
|  |  Network Utilization                                      |   |
|  |  +----------------------------------------------------+  |   |
|  |  | - Low throughput: Consider smaller instance         |  |   |
|  |  | - High throughput: Consider larger instance         |  |   |
|  |  | - Burst vs. Enhanced networking                      |  |   |
|  |  +----------------------------------------------------+  |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Right-Sizing Recommendations

# Get right-sizing recommendations using AWS CLI
aws ce get-rightsizing-recommendation \
  --service-configuration '{
    "ServiceCode": "AmazonEC2",
    "UsageUnit": "Hrs"
  }' \
  --filter '{
    "Dimensions": {
      "Key": "SERVICE",
      "Values": ["Amazon Elastic Compute Cloud - Compute"]
    }
  }'

# Output example:
{
  "RightsizingRecommendations": [
    {
      "CurrentInstance": {
        "InstanceId": "i-1234567890abcdef0",
        "InstanceType": "m5.xlarge",
        "Region": "us-east-1"
      },
      "RightsizingType": "MODIFY",
      "ModifyRecommendation": {
        "TargetInstance": {
          "InstanceType": "m5.large",
          "EstimatedMonthlySavings": 35.00
        }
      }
    }
  ]
}

Instance Type Selection

                    Instance Type Selection Guide
+------------------------------------------------------------------+
|                                                                   |
|  General Purpose                                                 |
|  +------------------+  +------------------+  +------------------+ |
|  | M5/M6g Series    |  | T3/T4g Series    |  | A1 Series        | |
|  |                  |  |                  |  |                  | |
|  | - Balanced       |  | - Burstable      |  | - ARM-based      | |
|  | - Production     |  | - Dev/Test       |  | - Cost-effective | |
|  | - General workloads| | - Variable load |  | - ARM workloads  | |
|  +------------------+  +------------------+  +------------------+ |
|                                                                   |
|  Compute Optimized                                               |
|  +------------------+  +------------------+                      |
|  | C5/C6g Series    |  | HPC Instances    |                      |
|  |                  |  |                  |                      |
|  | - High CPU       |  | - Batch          |                      |
|  | - Gaming         |  | - Scientific     |                      |
|  | - HPC            |  | - ML Training    |                      |
|  +------------------+  +------------------+                      |
|                                                                   |
|  Memory Optimized                                                |
|  +------------------+  +------------------+  +------------------+ |
|  | R5/R6g Series    |  | X1/X2 Series     |  | Z1D Series       | |
|  |                  |  |                  |  |                  | |
|  | - Databases      |  | - In-memory DB   |  | - High memory    | |
|  | - Big Data       |  | - SAP HANA       |  | - High CPU       | |
|  | - Analytics      |  | - Large datasets |  | - Enterprise     | |
|  +------------------+  +------------------+  +------------------+ |
|                                                                   |
+------------------------------------------------------------------+

47.5 Reserved Instances & Savings Plans

Reserved Instances

                    Reserved Instance Types
+------------------------------------------------------------------+
|                                                                   |
|  Standard Reserved Instances                                     |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  Term: 1 year or 3 years                                  |   |
|  |  Payment: All upfront, Partial upfront, No upfront        |   |
|  |  Discount: Up to 40% (1 year), 60% (3 years)             |   |
|  |  Flexibility: Can change AZ, size within family           |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
|  Convertible Reserved Instances                                  |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  Term: 1 year or 3 years                                  |   |
|  |  Payment: All upfront, Partial upfront, No upfront        |   |
|  |  Discount: Up to 30% (1 year), 45% (3 years)             |   |
|  |  Flexibility: Can change family, OS, tenancy              |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
|  Scheduled Reserved Instances                                    |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  Term: 1 year                                             |   |
|  |  Schedule: Recurring daily/weekly schedule                |   |
|  |  Use case: Predictable recurring workloads                |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Savings Plans

                    Savings Plans Types
+------------------------------------------------------------------+
|                                                                   |
|  Compute Savings Plan                                           |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  Commitment: $/hour for 1 or 3 years                      |   |
|  |  Discount: Up to 66%                                      |   |
|  |  Applies to:                                              |   |
|  |  - EC2 instances (any family, size, region, OS)           |   |
|  |  - Fargate                                                |   |
|  |  - Lambda                                                 |   |
|  |  Flexibility: Highest                                      |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
|  EC2 Instance Savings Plan                                       |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  Commitment: $/hour for 1 or 3 years                      |   |
|  |  Discount: Up to 72%                                      |   |
|  |  Applies to:                                              |   |
|  |  - EC2 instances within family in a region                 |   |
|  |  Flexibility: Size, OS, tenancy within family             |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
|  SageMaker Savings Plan                                         |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  Commitment: $/hour for 1 or 3 years                      |   |
|  |  Discount: Up to 64%                                      |   |
|  |  Applies to: SageMaker ML instances                       |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Savings Plan Configuration

# Savings Plan Purchase
Resources:
  ComputeSavingsPlan:
    Type: AWS::SavingsPlans::SavingsPlan
    Properties:
      SavingsPlanType: COMPUTE
      Commitment: 10.00  # $10/hour commitment
      Term: THREE_YEAR
      PaymentOption: NO_UPFRONT
      Tags:
        - Key: Environment
          Value: production
        - Key: Owner
          Value: platform-team

47.6 Spot Instances

Spot Instance Strategy

                    Spot Instance Strategy
+------------------------------------------------------------------+
|                                                                   |
|  Use Cases                                                       |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  Ideal for:                                                |   |
|  |  - Batch processing                                        |   |
|  |  - CI/CD pipelines                                         |   |
|  |  - Big data analytics                                      |   |
|  |  - Containerized workloads                                 |   |
|  |  - Stateless applications                                  |   |
|  |  - Image/video processing                                  |   |
|  |                                                           |   |
|  |  Not recommended for:                                      |   |
|  |  - Databases                                               |   |
|  |  - Stateful applications                                   |   |
|  |  - Long-running jobs without checkpointing                |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
|  Spot Best Practices                                             |
|  +----------------------------------------------------------+   |
|  |                                                           |   |
|  |  1. Use multiple instance types                           |   |
|  |  2. Use multiple Availability Zones                       |   |
|  |  3. Implement graceful shutdown                           |   |
|  |  4. Use Spot interruption notices                         |   |
|  |  5. Combine with On-Demand for critical capacity           |   |
|  |                                                           |   |
|  +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

Spot Fleet Configuration

# Spot Fleet Configuration
Resources:
  SpotFleetRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: spotfleet.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonEC2SpotFleetTaggingRole

  SpotFleet:
    Type: AWS::EC2::SpotFleet
    Properties:
      SpotFleetRequestConfigData:
        IamFleetRole: !GetAtt SpotFleetRole.Arn
        AllocationStrategy: capacityOptimized
        TargetCapacity: 10
        OnDemandTargetCapacity: 2  # 20% On-Demand
        InstanceInterruptionBehavior: terminate
        LaunchTemplateConfigs:
          - LaunchTemplateSpecification:
              LaunchTemplateId: !Ref LaunchTemplate
              Version: !GetAtt LaunchTemplate.LatestVersionNumber
            Overrides:
              - InstanceType: m5.large
                SubnetId: subnet-az-a
              - InstanceType: m5.xlarge
                SubnetId: subnet-az-a
              - InstanceType: m5.large
                SubnetId: subnet-az-b
              - InstanceType: m5.xlarge
                SubnetId: subnet-az-b
              - InstanceType: c5.large
                SubnetId: subnet-az-a
              - InstanceType: c5.large
                SubnetId: subnet-az-b

Spot Instance Interruption Handling

import boto3
import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """
    Handle Spot instance interruption notices
    """

    ec2 = boto3.client('ec2')
    asg = boto3.client('autoscaling')

    # Parse the interruption notice
    detail = event.get('detail', {})
    instance_id = detail.get('instance-id')
    action = detail.get('instance-action')

    if action == 'terminate':
        logger.info(f"Spot interruption notice for {instance_id}")

        # Get instance details
        instance = ec2.describe_instances(InstanceIds=[instance_id])
        tags = instance['Reservations'][0]['Instances'][0].get('Tags', [])

        # Find ASG from tags
        asg_name = None
        for tag in tags:
            if tag['Key'] == 'aws:autoscaling:groupName':
                asg_name = tag['Value']
                break

        if asg_name:
            # Detach instance from ASG with decrement
            asg.detach_instances(
                AutoScalingGroupName=asg_name,
                InstanceIds=[instance_id],
                ShouldDecrementDesiredCapacity=False
            )
            logger.info(f"Detached {instance_id} from {asg_name}")

        # Graceful shutdown tasks
        # - Save state to S3/DynamoDB
        # - Complete in-progress work
        # - Notify other services

        return {
            'statusCode': 200,
            'body': json.dumps({
                'message': 'Spot interruption handled',
                'instance_id': instance_id
            })
        }

47.7 Storage Cost Optimization

S3 Cost Optimization

                    S3 Storage Classes
+------------------------------------------------------------------+
|                                                                   |
|  Storage Class         | Use Case              | Cost            |
| -----------------------+----------------------+----------------- |
|  S3 Standard          | Frequently accessed   | $$$$            |
|  S3 Intelligent-Tiering| Unknown patterns     | $$$             |
|  S3 Standard-IA       | Infrequent access     | $$             |
|  S3 One Zone-IA       | Infrequent, non-critical| $             |
|  S3 Glacier Instant   | Archive, instant access| $              |
|  S3 Glacier Flexible  | Archive, hours access | $              |
|  S3 Glacier Deep Archive| Long-term archive   | $              |
|                                                                   |
+------------------------------------------------------------------+

S3 Lifecycle Policies

# S3 Lifecycle Configuration
Resources:
  DataBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: data-bucket
      VersioningConfiguration:
        Status: Enabled
      LifecycleConfiguration:
        Rules:
          # Transition to IA after 30 days
          - Id: TransitionToIA
            Status: Enabled
            Filter:
              Prefix: logs/
            Transitions:
              - TransitionInDays: 30
                StorageClass: STANDARD_IA
              - TransitionInDays: 90
                StorageClass: GLACIER
            ExpirationInDays: 365

          # Intelligent Tiering for unknown patterns
          - Id: IntelligentTiering
            Status: Enabled
            Filter:
              Prefix: uploads/
            Transitions:
              - StorageClass: INTELLIGENT_TIERING

          # Non-current version expiration
          - Id: NonCurrentVersionExpiration
            Status: Enabled
            NoncurrentVersionExpiration:
              NoncurrentDays: 30
              NewerNoncurrentVersions: 5

          # Delete incomplete multipart uploads
          - Id: MultipartUploadCleanup
            Status: Enabled
            AbortIncompleteMultipartUpload:
              DaysAfterInitiation: 7

EBS Cost Optimization

# EBS Volume Optimization
Resources:
  OptimizedVolume:
    Type: AWS::EC2::Volume
    Properties:
      AvailabilityZone: us-east-1a
      Size: 100
      VolumeType: gp3  # Most cost-effective general purpose
      Iops: 3000       # Baseline included
      Throughput: 125  # MB/s baseline included
      Encrypted: true
      KmsKeyId: !Ref EBSKMSKey
      Tags:
        - Key: Name
          Value: optimized-volume

  # Snapshot lifecycle
  SnapshotPolicy:
    Type: AWS::DLM::LifecyclePolicy
    Properties:
      Description: Daily snapshot policy
      State: ENABLED
      ExecutionRoleArn: !Ref DLMPolicyRole
      PolicyDetails:
        PolicyType: EBS_SNAPSHOT_MANAGEMENT
        ResourceTypes:
          - VOLUME
        TargetTags:
          - Key: Backup
            Value: daily
        Schedules:
          - Name: DailySnapshots
            CreateRule:
              Interval: 24
              IntervalUnit: HOURS
              Times:
                - "05:00"
            RetainRule:
              Count: 7
            CopyTags: true
            TagsToAdd:
              - Key: SnapshotType
                Value: automated

47.8 Data Transfer Optimization

Data Transfer Costs

                    Data Transfer Costs
+------------------------------------------------------------------+
|                                                                   |
|  Inbound Data Transfer                                           |
|  +----------------------------------------------------------+   |
|  | - Free: All data transfer into AWS                        |   |
|  +----------------------------------------------------------+   |
|                                                                   |
|  Outbound Data Transfer                                          |
|  +----------------------------------------------------------+   |
|  | - First 100 GB/month: Free                                |   |
|  | - Up to 10 TB/month: $0.09/GB                            |   |
|  | - Next 40 TB/month: $0.085/GB                            |   |
|  | - Next 100 TB/month: $0.07/GB                            |   |
|  | - Over 150 TB/month: Contact AWS                          |   |
|  +----------------------------------------------------------+   |
|                                                                   |
|  Inter-Region Data Transfer                                      |
|  +----------------------------------------------------------+   |
|  | - Between regions: $0.02-$0.14/GB                         |   |
|  | - Same region: Free                                        |   |
|  +----------------------------------------------------------+   |
|                                                                   |
|  Cost Optimization Strategies                                    |
|  +----------------------------------------------------------+   |
|  | - Use CloudFront for content delivery                      |   |
|  | - Use VPC endpoints for AWS services                       |   |
|  | - Compress data before transfer                            |   |
|  | - Use Direct Connect for high volume                       |   |
|  +----------------------------------------------------------+   |
|                                                                   |
+------------------------------------------------------------------+

CloudFront for Cost Optimization

# CloudFront Distribution
Resources:
  CloudFrontDistribution:
    Type: AWS::CloudFront::Distribution
    Properties:
      DistributionConfig:
        Enabled: true
        PriceClass: PriceClass_100  # Use edge locations in US/EU only
        Origins:
          - DomainName: !GetAtt OriginBucket.RegionalDomainName
            Id: S3Origin
            S3OriginConfig:
              OriginAccessIdentity: !Ref CloudFrontOAI
        DefaultCacheBehavior:
          TargetOriginId: S3Origin
          ViewerProtocolPolicy: redirect-to-https
          CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639d58f6  # CachingOptimized
          Compress: true  # Enable compression
        CacheBehaviors:
          - PathPattern: /static/*
            TargetOriginId: S3Origin
            ViewerProtocolPolicy: redirect-to-https
            CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639d58f6
            Compress: true
        DefaultTTL: 86400  # 1 day
        MaxTTL: 31536000   # 1 year
        MinTTL: 0

47.9 Cost Governance

Tagging Policy

# AWS Tag Policy
Resources:
  TagPolicy:
    Type: AWS::Organizations::Policy
    Properties:
      Name: RequiredTagsPolicy
      Description: Enforce required tags
      Type: TAG_POLICY
      Content: |
        {
          "tags": {
            "Environment": {
              "tag_key": {
                "@@assign": "Environment"
              },
              "tag_value": {
                "@@assign": ["dev", "staging", "prod"]
              },
              "enforced_for": {
                "@@assign": [
                  "ec2:instance",
                  "ec2:volume",
                  "s3:bucket",
                  "rds:db"
                ]
              }
            },
            "Owner": {
              "tag_key": {
                "@@assign": "Owner"
              },
              "tag_value": {
                "@@assign": "*"
              },
              "enforced_for": {
                "@@assign": [
                  "ec2:instance",
                  "s3:bucket"
                ]
              }
            },
            "CostCenter": {
              "tag_key": {
                "@@assign": "CostCenter"
              },
              "tag_value": {
                "@@assign": "*"
              }
            }
          }
        }

Service Control Policies for Cost

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyExpensiveInstanceTypes",
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "arn:aws:ec2:*:*:instance/*",
      "Condition": {
        "ForAnyValue:StringLike": {
          "ec2:InstanceType": [
            "*.8xlarge",
            "*.12xlarge",
            "*.16xlarge",
            "*.24xlarge",
            "*.metal"
          ]
        }
      }
    },
    {
      "Sid": "DenyUntaggedResources",
      "Effect": "Deny",
      "Action": [
        "ec2:RunInstances",
        "s3:CreateBucket",
        "rds:CreateDBInstance"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:instance/*",
        "arn:aws:s3:::*",
        "arn:aws:rds:*:*:db:*"
      ],
      "Condition": {
        "StringNotEquals": {
          "aws:RequestTag/Environment": "?*"
        }
      }
    }
  ]
}

47.10 Cost Monitoring Automation

Automated Cost Reporting

import boto3
import json
import logging
from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """
    Generate and send cost reports
    """

    ce = boto3.client('ce')
    sns = boto3.client('sns')

    # Get date range
    end_date = datetime.now()
    start_date = end_date - timedelta(days=7)

    # Get cost and usage
    response = ce.get_cost_and_usage(
        TimePeriod={
            'Start': start_date.strftime('%Y-%m-%d'),
            'End': end_date.strftime('%Y-%m-%d')
        },
        Granularity='DAILY',
        Metrics=['UnblendedCost'],
        GroupBy=[
          {'Type': 'DIMENSION', 'Key': 'SERVICE'},
          {'Type': 'DIMENSION', 'Key': 'LINKED_ACCOUNT'}
        ]
    )

    # Process results
    total_cost = 0
    service_costs = {}

    for result in response['ResultsByTime']:
        for group in result['Groups']:
            service = group['Keys'][0]
            account = group['Keys'][1]
            cost = float(group['Metrics']['UnblendedCost']['Amount'])

            total_cost += cost

            if service not in service_costs:
                service_costs[service] = 0
            service_costs[service] += cost

    # Sort by cost
    sorted_services = sorted(
        service_costs.items(),
        key=lambda x: x[1],
        reverse=True
    )

    # Build report
    report = f"""
    Weekly AWS Cost Report
    ======================
    Period: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}
    Total Cost: ${total_cost:.2f}

    Top 10 Services by Cost:
    """

    for service, cost in sorted_services[:10]:
        report += f"\n    - {service}: ${cost:.2f}"

    # Get forecast
    forecast = ce.get_cost_forecast(
        TimePeriod={
            'Start': end_date.strftime('%Y-%m-%d'),
            'End': (end_date + relativedelta(months=1)).strftime('%Y-%m-%d')
        },
        Metric='UNBLENDED_COST',
        Granularity='MONTHLY'
    )

    forecast_amount = float(
        forecast['ForecastResultsByTime'][0]['MeanValue']
    )

    report += f"\n\n    Monthly Forecast: ${forecast_amount:.2f}"

    # Send notification
    sns.publish(
        TopicArn='arn:aws:sns:us-east-1:123456789012:cost-reports',
        Subject='Weekly AWS Cost Report',
        Message=report
    )

    return {
        'statusCode': 200,
        'body': json.dumps({
            'total_cost': total_cost,
            'forecast': forecast_amount
        })
    }

Automated Resource Cleanup

import boto3
import json
import logging
from datetime import datetime, timedelta

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """
    Clean up unused resources
    """

    ec2 = boto3.client('ec2')
    rds = boto3.client('rds')
    s3 = boto3.client('s3')

    results = {
        'volumes_deleted': [],
        'snapshots_deleted': [],
        'old_snapshots': []
    }

    # 1. Delete unattached EBS volumes
    volumes = ec2.describe_volumes(
        Filters=[
            {'Name': 'status', 'Values': ['available']}
        ]
    )

    for volume in volumes['Volumes']:
        # Check if volume is old enough (7 days)
        create_time = volume['CreateTime'].replace(tzinfo=None)
        if datetime.now() - create_time > timedelta(days=7):
            # Check for tags that prevent deletion
            tags = {t['Key']: t['Value'] for t in volume.get('Tags', [])}
            if tags.get('KeepAlive', 'false').lower() != 'true':
                ec2.delete_volume(VolumeId=volume['VolumeId'])
                results['volumes_deleted'].append(volume['VolumeId'])
                logger.info(f"Deleted unattached volume: {volume['VolumeId']}")

    # 2. Delete old snapshots (older than 90 days)
    snapshots = ec2.describe_snapshots(OwnerIds=['self'])

    for snapshot in snapshots['Snapshots']:
        start_time = snapshot['StartTime'].replace(tzinfo=None)
        if datetime.now() - start_time > timedelta(days=90):
            # Check for tags
            tags = {t['Key']: t['Value'] for t in snapshot.get('Tags', [])}
            if tags.get('KeepForever', 'false').lower() != 'true':
                ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId'])
                results['snapshots_deleted'].append(snapshot['SnapshotId'])
                logger.info(f"Deleted old snapshot: {snapshot['SnapshotId']}")

    # 3. Find RDS instances without recent connections
    # (This would require CloudWatch metrics analysis)

    return {
        'statusCode': 200,
        'body': json.dumps(results)
    }

47.11 Best Practices

Cost Optimization Checklist

# Cost Optimization Checklist

## Compute
- [ ] Right-size EC2 instances based on utilization
- [ ] Use Reserved Instances or Savings Plans for steady workloads
- [ ] Use Spot Instances for flexible workloads
- [ ] Implement Auto Scaling
- [ ] Schedule non-production instances to stop after hours

## Storage
- [ ] Use S3 lifecycle policies
- [ ] Use appropriate storage classes
- [ ] Delete unattached EBS volumes
- [ ] Use EBS gp3 for better price/performance
- [ ] Implement snapshot lifecycle policies

## Network
- [ ] Use CloudFront for content delivery
- [ ] Use VPC endpoints for AWS services
- [ ] Minimize inter-region data transfer
- [ ] Compress data before transfer

## Database
- [ ] Right-size RDS instances
- [ ] Use Aurora Serverless for variable workloads
- [ ] Implement read replicas for read-heavy workloads
- [ ] Use ElastiCache for caching

## Governance
- [ ] Implement tagging strategy
- [ ] Set up AWS Budgets
- [ ] Enable Cost Anomaly Detection
- [ ] Regular cost reviews
- [ ] Implement showback/chargeback

47.12 Key Takeaways

Topic	Key Points
FinOps	Implement continuous cost management cycle
Right-Sizing	Monitor utilization and adjust resources
Commitment	Use RIs and Savings Plans for steady workloads
Spot	Use Spot for flexible, fault-tolerant workloads
Storage	Use lifecycle policies and appropriate storage classes
Governance	Implement tagging, budgets, and policies

47.13 References

Next Chapter: Chapter 48 - Multi-Region & Multi-Account Strategies