What is HPA (Horizontal Pod Autoscaler) in Kubernetes?

🚀 What is HPA (Horizontal Pod Autoscaler) in Kubernetes?

Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that automatically scales the number of pods in a deployment, replica set, or stateful set based on CPU, memory usage, or custom metrics.

How Does HPA Work?

HPA monitors the resource utilization of pods and adjusts the number of replicas accordingly. It ensures that the application can handle varying loads efficiently.

  • If CPU/memory usage increases, HPA adds more pods.
  • If CPU/memory usage decreases, HPA removes extra pods.

Example: Implementing HPA in Kubernetes

We’ll create a deployment, expose it via a service, and apply HPA to auto-scale based on CPU usage.

Step 1: Enable Metrics Server

HPA requires a Metrics Server to monitor resource usage. If it’s not installed, install it using:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify installation:

kubectl get deployment metrics-server -n kube-system

Step 2: Create a Deployment

Save the following YAML as deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: k8s.gcr.io/hpa-example
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "100m"
            limits:
              cpu: "200m"

Apply the deployment:

kubectl apply -f deployment.yaml

Step 3: Expose the Deployment as a Service

kubectl expose deployment my-app --type=LoadBalancer --name=my-service --port=80

Verify the service:

kubectl get services

Step 4: Create an HPA Resource

Save the following YAML as hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

Apply the HPA configuration:

kubectl apply -f hpa.yaml

Check the HPA status:

kubectl get hpa

Step 5: Simulate High Load

Run a load test using:

kubectl run -i --tty load-generator --image=busybox -- sh

Inside the pod, execute:

while true; do wget -q -O- http://my-service; done

Check if the HPA is scaling pods:

kubectl get hpa
kubectl get pods

Step 6: Cleanup

Once done, delete all resources:

kubectl delete -f hpa.yaml
kubectl delete -f deployment.yaml
kubectl delete service my-service

🎯 Key Takeaways:

✅ HPA scales pods automatically based on CPU or memory usage.
✅ It requires a metrics server to monitor resource utilization.
✅ Load testing helps verify auto-scaling behavior.

🚀 Next Steps:

  • Use custom metrics for scaling (e.g., requests per second).
  • Implement VPA (Vertical Pod Autoscaler) for scaling resource limits.

#Kubernetes, #HPA, #Autoscaling, #DevOps, #CloudComputing, #K8s, #Scalability, #KubernetesTutorial, #InfrastructureAutomation

About Anant 443 Articles
Senior technical writer