Kubernetes_hpa
Kubernetes Horizontal Pod Autoscaling (HPA)
Section titled “Kubernetes Horizontal Pod Autoscaling (HPA)”Overview
Section titled “Overview”Horizontal Pod Autoscaling (HPA) automatically adjusts the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed CPU utilization, memory usage, or custom metrics. This enables applications to handle increased load automatically and reduce costs during low traffic periods.
How HPA Works
Section titled “How HPA Works”┌─────────────────────────────────────────────────────────────────┐│ HPA Controller ││ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Metrics │───▶│ Decision │───▶│ Scale │ ││ │ Server │ │ Engine │ │ Action │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ │ │ ││ ▼ ▼ ││ ┌──────────────┐ ┌──────────────┐ ││ │ Pod Metrics │ │ ReplicaSet │ ││ │ (CPU/Mem) │ │ Update │ ││ └──────────────┘ └──────────────┘ │└─────────────────────────────────────────────────────────────────┘The HPA controller:
- Polls Metrics Server at regular intervals (default: 15 seconds)
- Calculates desired replica count based on metrics
- Updates the scale sub-resource of the target resource
- Continuously monitors to adjust as needed
Installing Metrics Server
Section titled “Installing Metrics Server”The Metrics Server must be installed for HPA to work:
# Install using kubectlkubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Or install via Helmhelm repo add metrics-server https://kubernetes-sigs.github.io/metrics-serverhelm install metrics-server metrics-server/metrics-server \ --namespace kube-system \ --set args[0]="--kubelet-insecure-tls"Verify installation:
# Check metrics-server podkubectl get pods -n kube-system -l k8s-app=metrics-server
# Test metrics APIkubectl get --raw /apis/metrics.k8s.io/v1beta1/nodeskubectl get --raw /apis/metrics.k8s.io/v1beta1/podsBasic HPA Configuration
Section titled “Basic HPA Configuration”Create an HPA for a Deployment:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: myapp-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 10 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: MaxKey fields:
scaleTargetRef: The target to scale (Deployment, ReplicaSet, or StatefulSet)minReplicas/maxReplicas: Bounds for the replica countmetrics: Metrics to base scaling decisions onbehavior: Configure scaling behavior for stability
CPU-Based Autoscaling
Section titled “CPU-Based Autoscaling”The most common autoscaling metric:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: web-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 1 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50This will maintain average CPU utilization at 50%.
Create and test:
# Create HPAkubectl apply -f hpa.yaml
# View HPA statuskubectl get hpa
# Detailed informationkubectl describe hpa myapp-hpaMemory-Based Autoscaling
Section titled “Memory-Based Autoscaling”Scale based on memory usage:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: memory-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: memory target: type: Utilization averageUtilization: 70 - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 80Note: Memory-based scaling requires the application to handle memory gracefully (no memory leaks).
Multiple Metrics
Section titled “Multiple Metrics”Combine multiple metrics for better scaling decisions:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: multi-metric-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp-deployment minReplicas: 2 maxReplicas: 15 metrics: # CPU metric - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 # Memory metric - type: Resource resource: name: memory target: type: Utilization averageUtilization: 70 # Use the metric that requires the highest replica count scalingPolicy: metric: type: Resource resource: name: cpu mode: AverageValue containerResource: container: appCustom Metrics Autoscaling
Section titled “Custom Metrics Autoscaling”Scale based on application-specific metrics:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: custom-metric-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp-deployment minReplicas: 1 maxReplicas: 10 metrics: # Custom metric from Prometheus - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "100" # CPU as fallback - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70You need a custom metrics adapter (like Prometheus Adapter) for this to work.
Scaling Behavior Configuration
Section titled “Scaling Behavior Configuration”Control how aggressively HPA scales:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: controlled-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp-deployment minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 behavior: # Scale down behavior scaleDown: # Wait 5 minutes before scaling down stabilizationWindowSeconds: 300 policies: # Remove at most 10% of pods per minute - type: Percent value: 10 periodSeconds: 60 # Or remove max 4 pods per minute - type: Pods value: 4 periodSeconds: 60 # Use the policy that results in the largest reduction selectPolicy: Max # Scale up behavior scaleUp: # No stabilization - scale up immediately stabilizationWindowSeconds: 0 policies: # Double pods every 15 seconds - type: Percent value: 100 periodSeconds: 15 # Or add 4 pods every 15 seconds - type: Pods value: 4 periodSeconds: 15 selectPolicy: MaxVertical Pod Autoscaling (VPA)
Section titled “Vertical Pod Autoscaling (VPA)”VPA adjusts resource requests (CPU/memory) for pods:
apiVersion: autoscaling.k8s.io/v1kind: VerticalPodAutoscalermetadata: name: myapp-vpaspec: targetRef: apiVersion: "apps/v1" kind: Deployment name: myapp-deployment updatePolicy: updateMode: "Auto" # Auto, Recreate, Off resourcePolicy: containerPolicies: - containerName: app minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 4 memory: 8Gi controlledResources: ["cpu", "memory"]Important: Don’t use HPA and VPA on the same pods simultaneously without proper configuration.
Best Practices
Section titled “Best Practices”1. Set Appropriate Resource Requests
Section titled “1. Set Appropriate Resource Requests”Always set resource requests for HPA to work effectively:
containers:- name: app image: myapp:latest resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi2. Set Appropriate Min/Max Replicas
Section titled “2. Set Appropriate Min/Max Replicas”- minReplicas: At least 2 for production to ensure availability during updates
- maxReplicas: Cap based on cluster capacity and cost budget
3. Use Appropriate Metrics
Section titled “3. Use Appropriate Metrics”- CPU: Good for stateless, CPU-intensive workloads
- Memory: Good for workloads with consistent memory patterns
- Custom: For queue depth, request latency, etc.
4. Configure Stabilization Windows
Section titled “4. Configure Stabilization Windows”Prevent “flapping” (rapid scale up/down):
behavior: scaleDown: stabilizationWindowSeconds: 300 # 5 minutes scaleUp: stabilizationWindowSeconds: 0 # Immediate scale up5. Test Autoscaling
Section titled “5. Test Autoscaling”# Generate load to testkubectl run -it --rm load-generator --image=busybox -- /bin/sh# Then inside the container:while true; do wget -q -O- http://myapp-service; done
# Monitor HPAkubectl get hpa myapp-hpa --watchTroubleshooting
Section titled “Troubleshooting”Common issues and solutions:
# HPA not creating podskubectl describe hpa <name>
# Check metrics availabilitykubectl top podskubectl top nodes
# Check HPA eventskubectl get events --field-selector involvedObject.name=<hpa-name>
# Common issues:# 1. Missing metrics-server# 2. Missing resource requests in pods# 3. HPA targeting wrong resource# 4. Cluster at capacity (max nodes reached)Summary
Section titled “Summary”Horizontal Pod Autoscaling is essential for:
- Cost optimization: Scale down during low traffic
- Availability: Scale up during high traffic
- Resilience: Handle traffic spikes automatically
Key considerations:
- Always set resource requests
- Choose appropriate metrics for your workload
- Configure scaling behavior for stability
- Test autoscaling under load
- Monitor and adjust based on real-world behavior