Advanced Kubernetes Troubleshooting Questions & Solutions

πŸ”₯ Advanced Kubernetes Troubleshooting Questions & Solutions

Let’s dive deeper into real-world Kubernetes troubleshooting scenarios with detailed step-by-step solutions. These questions will help you debug cluster issues like a pro! πŸš€


1️⃣ How do you troubleshoot a pod stuck in Terminating state?

βœ… Possible Causes & Fixes:

πŸ”Ή Check if the pod is stuck due to finalizers:

kubectl get pod <pod-name> -n <namespace> -o json | jq .metadata.finalizers

πŸ”Ή Force delete the pod:

kubectl delete pod <pod-name> --grace-period=0 --force -n <namespace>

πŸ”Ή Check if the node is unresponsive:

kubectl get nodes

πŸ”Ή If the node is down, drain and remove it:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
kubectl delete node <node-name>

2️⃣ How do you debug ErrImagePull and ImagePullBackOff issues?

βœ… Possible Causes & Fixes:

πŸ”Ή Check pod events for details:

kubectl describe pod <pod-name>

πŸ”Ή Ensure the image exists and is accessible:

docker pull <image-name>

πŸ”Ή Check for missing authentication (private registry):

imagePullSecrets:
  - name: my-docker-secret

πŸ”Ή If using a private registry, verify the secret exists:

kubectl get secrets -n <namespace>

πŸ”Ή Manually delete and recreate the pod:

kubectl delete pod <pod-name>

3️⃣ How do you check why a pod is evicted?

βœ… Possible Causes & Fixes:

πŸ”Ή List evicted pods:

kubectl get pods --field-selector=status.phase=Failed

πŸ”Ή Check eviction reason:

kubectl describe pod <evicted-pod-name>

πŸ”Ή If caused by memory pressure, increase memory limits:

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

πŸ”Ή Manually remove evicted pods:

kubectl delete pod <pod-name>

4️⃣ How do you troubleshoot slow pod scheduling?

βœ… Possible Causes & Fixes:

πŸ”Ή Check pending pods:

kubectl get pods --field-selector=status.phase=Pending

πŸ”Ή Check if the cluster is out of resources:

kubectl describe node <node-name>

πŸ”Ή Check pod scheduling events:

kubectl get events --sort-by=.metadata.creationTimestamp

πŸ”Ή Ensure node taints/tolerations allow scheduling:

kubectl describe node <node-name> | grep -i taint

πŸ”Ή Verify affinity/anti-affinity settings:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: "disktype"
              operator: In
              values:
                - ssd

5️⃣ How do you fix CrashLoopBackOff due to liveness probe failures?

βœ… Possible Causes & Fixes:

πŸ”Ή Check logs for errors:

kubectl logs <pod-name>

πŸ”Ή Check liveness probe configuration:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

πŸ”Ή Test if the probe endpoint is accessible from within the pod:

kubectl exec -it <pod-name> -- curl localhost:8080/healthz

πŸ”Ή If needed, disable liveness probe temporarily:

livenessProbe: null

6️⃣ How do you fix Node Not Ready issues?

βœ… Possible Causes & Fixes:

πŸ”Ή Check node status:

kubectl get nodes

πŸ”Ή Inspect Kubelet logs:

journalctl -u kubelet -f

πŸ”Ή Restart the Kubelet service:

systemctl restart kubelet

πŸ”Ή Check disk space:

df -h

πŸ”Ή Verify node taints are not preventing scheduling:

kubectl describe node <node-name> | grep -i taint

7️⃣ How do you fix a failing Ingress?

βœ… Possible Causes & Fixes:

πŸ”Ή Check Ingress resources:

kubectl get ingress -n <namespace>

πŸ”Ή Describe the Ingress to check for errors:

kubectl describe ingress <ingress-name> -n <namespace>

πŸ”Ή Ensure the correct backend service exists:

kubectl get svc -n <namespace>

πŸ”Ή Check if Ingress Controller is running:

kubectl get pods -n kube-system | grep ingress

πŸ”Ή Verify DNS resolution:

nslookup my-app.example.com

8️⃣ How do you troubleshoot Kubernetes persistent volume (PV) issues?

βœ… Possible Causes & Fixes:

πŸ”Ή Check PV status:

kubectl get pv

πŸ”Ή Check Persistent Volume Claim (PVC) status:

kubectl get pvc -n <namespace>

πŸ”Ή Describe the PVC for errors:

kubectl describe pvc <pvc-name> -n <namespace>

πŸ”Ή Ensure the storage class is available:

kubectl get storageclass

πŸ”Ή If using AWS EBS, verify the disk is attached:

aws ec2 describe-volumes --filters Name=tag:KubernetesCluster,Values=my-cluster

9️⃣ How do you fix high CPU/memory usage in Kubernetes?

βœ… Possible Causes & Fixes:

πŸ”Ή Check pod resource usage:

kubectl top pod -n <namespace>

πŸ”Ή Check node resource usage:

kubectl top node

πŸ”Ή Increase CPU/memory limits:

resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "1Gi"

πŸ”Ή If using Horizontal Pod Autoscaler (HPA), scale based on CPU:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80

πŸ”Ÿ How do you restart all pods in a namespace?

βœ… Solution:

kubectl delete pods --all -n <namespace>

or

kubectl rollout restart deployment <deployment-name> -n <namespace>

πŸ”₯ Summary

βœ” Use kubectl describe & kubectl logs for debugging.
βœ” Check node, pod, service, and network issues.
βœ” Restart pods, nodes, or Ingress controllers if necessary.
βœ” Monitor performance using kubectl top and HPA.

πŸš€ Want More Kubernetes Troubleshooting Tips? Let us know! πŸ”₯

Kubernetes, Troubleshooting, DevOps, CloudComputing, kube-proxy, Containers, Microservices, K8s, Networking, ClusterManagement, Debugging

About Anant 441 Articles
Senior technical writer