
🚀 What is HPA (Horizontal Pod Autoscaler) in Kubernetes?
Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that automatically scales the number of pods in a deployment, replica set, or stateful set based on CPU, memory usage, or custom metrics.
How Does HPA Work?
HPA monitors the resource utilization of pods and adjusts the number of replicas accordingly. It ensures that the application can handle varying loads efficiently.
- If CPU/memory usage increases, HPA adds more pods.
- If CPU/memory usage decreases, HPA removes extra pods.
Example: Implementing HPA in Kubernetes
We’ll create a deployment, expose it via a service, and apply HPA to auto-scale based on CPU usage.
Step 1: Enable Metrics Server
HPA requires a Metrics Server to monitor resource usage. If it’s not installed, install it using:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verify installation:
kubectl get deployment metrics-server -n kube-system
Step 2: Create a Deployment
Save the following YAML as deployment.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
limits:
cpu: "200m"
Apply the deployment:
kubectl apply -f deployment.yaml
Step 3: Expose the Deployment as a Service
kubectl expose deployment my-app --type=LoadBalancer --name=my-service --port=80
Verify the service:
kubectl get services
Step 4: Create an HPA Resource
Save the following YAML as hpa.yaml
:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Apply the HPA configuration:
kubectl apply -f hpa.yaml
Check the HPA status:
kubectl get hpa
Step 5: Simulate High Load
Run a load test using:
kubectl run -i --tty load-generator --image=busybox -- sh
Inside the pod, execute:
while true; do wget -q -O- http://my-service; done
Check if the HPA is scaling pods:
kubectl get hpa
kubectl get pods
Step 6: Cleanup
Once done, delete all resources:
kubectl delete -f hpa.yaml
kubectl delete -f deployment.yaml
kubectl delete service my-service
🎯 Key Takeaways:
✅ HPA scales pods automatically based on CPU or memory usage.
✅ It requires a metrics server to monitor resource utilization.
✅ Load testing helps verify auto-scaling behavior.
🚀 Next Steps:
- Use custom metrics for scaling (e.g., requests per second).
- Implement VPA (Vertical Pod Autoscaler) for scaling resource limits.
#Kubernetes, #HPA, #Autoscaling, #DevOps, #CloudComputing, #K8s, #Scalability, #KubernetesTutorial, #InfrastructureAutomation