Kubernetes Troubleshooting Questions & Answers for Beginners to Experts

πŸ”₯ Kubernetes Troubleshooting Questions & Answers for Beginners to Experts

Troubleshooting Kubernetes can be tricky, but mastering it is essential for DevOps engineers and cloud professionals. Here are some common Kubernetes troubleshooting questions, along with solutions and best practices.


1️⃣ How do you check if a pod is running properly?

βœ… Solution:

Run:

kubectl get pods -n <namespace>

Look at the STATUS column. If it says CrashLoopBackOff, Pending, or Error, there’s a problem.

Use detailed logs:

kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>

2️⃣ What should you do if a pod is stuck in Pending state?

βœ… Possible Causes & Solutions:

πŸ”Ή Insufficient resources β†’ Check node capacity:

kubectl describe node <node-name>

πŸ”Ή Failed scheduling β†’ Check events:

kubectl get events --sort-by=.metadata.creationTimestamp

πŸ”Ή Affinity or taints/tolerations issue β†’ Verify pod spec:

kubectl describe pod <pod-name>

πŸ”Ή Network issues β†’ Check CNI plugin logs.


3️⃣ How do you troubleshoot a pod stuck in CrashLoopBackOff?

βœ… Possible Causes & Fixes:

πŸ”Ή Application crash β†’ Check logs:

kubectl logs <pod-name> -n <namespace>

πŸ”Ή Configuration issue β†’ Inspect pod details:

kubectl describe pod <pod-name>

πŸ”Ή Liveness probe failure β†’ Review health check settings:

kubectl get pod <pod-name> -o yaml | grep -i "livenessProbe"

πŸ”Ή OOMKilled (Out of Memory) β†’ Increase memory requests/limits:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

4️⃣ What if a service is not accessible?

βœ… Step-by-Step Troubleshooting:

πŸ”Ή Check if the service exists:

kubectl get svc -n <namespace>

πŸ”Ή Verify service endpoints:

kubectl get endpoints <service-name> -n <namespace>

πŸ”Ή Ensure the correct port is exposed:

kubectl describe svc <service-name>

πŸ”Ή Check if pods are responding inside the cluster:

kubectl exec -it <pod-name> -- curl <service-name>:<port>

πŸ”Ή Verify network policies are not blocking access.


5️⃣ How to debug a failing deployment?

βœ… Step-by-Step Guide:

πŸ”Ή Check deployment rollout status:

kubectl rollout status deployment <deployment-name> -n <namespace>

πŸ”Ή Describe the deployment to check for issues:

kubectl describe deployment <deployment-name> -n <namespace>

πŸ”Ή Look for failing pods:

kubectl get pods --selector=app=<app-name> -n <namespace>

πŸ”Ή Roll back a failing deployment:

kubectl rollout undo deployment <deployment-name> -n <namespace>

6️⃣ How do you troubleshoot DNS issues in Kubernetes?

βœ… Possible Causes & Fixes:

πŸ”Ή Check if CoreDNS is running:

kubectl get pods -n kube-system | grep coredns

πŸ”Ή Test DNS resolution inside a pod:

kubectl exec -it <pod-name> -- nslookup google.com

πŸ”Ή Restart CoreDNS if necessary:

kubectl delete pod -n kube-system -l k8s-app=kube-dns

7️⃣ How do you debug ImagePullBackOff errors?

βœ… Possible Causes & Fixes:

πŸ”Ή Incorrect image name/tag β†’ Verify image correctness:

kubectl describe pod <pod-name>

πŸ”Ή Authentication issues β†’ Ensure the correct secret is used:

imagePullSecrets:
  - name: my-secret

πŸ”Ή Check container runtime logs:

sudo journalctl -u containerd -f

πŸ”Ή Manually pull the image to check errors:

docker pull <image-name>

8️⃣ How do you troubleshoot network connectivity issues between pods?

βœ… Possible Causes & Fixes:

πŸ”Ή Check if the pod has the correct IP:

kubectl get pods -o wide

πŸ”Ή Use ping or curl to test connectivity:

kubectl exec -it <pod-name> -- ping <target-pod-ip>

πŸ”Ή Check CNI plugin logs:

journalctl -u kubelet | grep CNI

πŸ”Ή Ensure Network Policies are not blocking traffic:

kubectl get networkpolicy -A

9️⃣ What should you do if a node becomes NotReady?

βœ… Possible Causes & Fixes:

πŸ”Ή Check node status:

kubectl get nodes -o wide

πŸ”Ή Inspect node logs:

journalctl -u kubelet -f

πŸ”Ή Verify disk space:

df -h

πŸ”Ή Restart the node or kubelet service:

systemctl restart kubelet

πŸ”Ή Check if the node is tainted:

kubectl describe node <node-name> | grep -i taint

πŸ”Ÿ How do you fix a stuck Kubernetes job?

βœ… Possible Causes & Fixes:

πŸ”Ή Check job status:

kubectl get jobs -n <namespace>

πŸ”Ή Check logs:

kubectl logs job/<job-name> -n <namespace>

πŸ”Ή If the job is stuck, delete and recreate it:

kubectl delete job <job-name> -n <namespace>

πŸ”Ή Increase backoffLimit in the job spec to allow retries:

backoffLimit: 5

🎯 Summary

βœ… Use kubectl describe to inspect resources.
βœ… Check logs with kubectl logs.
βœ… Verify network issues with kubectl get svc & kubectl get endpoints.
βœ… Restart kube-proxy, kubelet, or CoreDNS if needed.
βœ… Monitor events with kubectl get events --sort-by=.metadata.creationTimestamp.

πŸš€ Want More Kubernetes Troubleshooting Tips? Let us know! πŸ”₯

Kubernetes, Troubleshooting, DevOps, CloudComputing, kube-proxy, Containers, Microservices, K8s, Networking, ClusterManagement, Debugging

About Anant 441 Articles
Senior technical writer