
π₯ Kubernetes Troubleshooting Questions & Answers for Beginners to Experts
Troubleshooting Kubernetes can be tricky, but mastering it is essential for DevOps engineers and cloud professionals. Here are some common Kubernetes troubleshooting questions, along with solutions and best practices.
1οΈβ£ How do you check if a pod is running properly?
β Solution:
Run:
kubectl get pods -n <namespace>
Look at the STATUS column. If it says CrashLoopBackOff, Pending, or Error, thereβs a problem.
Use detailed logs:
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
2οΈβ£ What should you do if a pod is stuck in Pending
state?
β Possible Causes & Solutions:
πΉ Insufficient resources β Check node capacity:
kubectl describe node <node-name>
πΉ Failed scheduling β Check events:
kubectl get events --sort-by=.metadata.creationTimestamp
πΉ Affinity or taints/tolerations issue β Verify pod spec:
kubectl describe pod <pod-name>
πΉ Network issues β Check CNI plugin logs.
3οΈβ£ How do you troubleshoot a pod stuck in CrashLoopBackOff
?
β Possible Causes & Fixes:
πΉ Application crash β Check logs:
kubectl logs <pod-name> -n <namespace>
πΉ Configuration issue β Inspect pod details:
kubectl describe pod <pod-name>
πΉ Liveness probe failure β Review health check settings:
kubectl get pod <pod-name> -o yaml | grep -i "livenessProbe"
πΉ OOMKilled (Out of Memory) β Increase memory requests/limits:
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"
4οΈβ£ What if a service is not accessible?
β Step-by-Step Troubleshooting:
πΉ Check if the service exists:
kubectl get svc -n <namespace>
πΉ Verify service endpoints:
kubectl get endpoints <service-name> -n <namespace>
πΉ Ensure the correct port is exposed:
kubectl describe svc <service-name>
πΉ Check if pods are responding inside the cluster:
kubectl exec -it <pod-name> -- curl <service-name>:<port>
πΉ Verify network policies are not blocking access.
5οΈβ£ How to debug a failing deployment?
β Step-by-Step Guide:
πΉ Check deployment rollout status:
kubectl rollout status deployment <deployment-name> -n <namespace>
πΉ Describe the deployment to check for issues:
kubectl describe deployment <deployment-name> -n <namespace>
πΉ Look for failing pods:
kubectl get pods --selector=app=<app-name> -n <namespace>
πΉ Roll back a failing deployment:
kubectl rollout undo deployment <deployment-name> -n <namespace>
6οΈβ£ How do you troubleshoot DNS issues in Kubernetes?
β Possible Causes & Fixes:
πΉ Check if CoreDNS is running:
kubectl get pods -n kube-system | grep coredns
πΉ Test DNS resolution inside a pod:
kubectl exec -it <pod-name> -- nslookup google.com
πΉ Restart CoreDNS if necessary:
kubectl delete pod -n kube-system -l k8s-app=kube-dns
7οΈβ£ How do you debug ImagePullBackOff
errors?
β Possible Causes & Fixes:
πΉ Incorrect image name/tag β Verify image correctness:
kubectl describe pod <pod-name>
πΉ Authentication issues β Ensure the correct secret is used:
imagePullSecrets:
- name: my-secret
πΉ Check container runtime logs:
sudo journalctl -u containerd -f
πΉ Manually pull the image to check errors:
docker pull <image-name>
8οΈβ£ How do you troubleshoot network connectivity issues between pods?
β Possible Causes & Fixes:
πΉ Check if the pod has the correct IP:
kubectl get pods -o wide
πΉ Use ping
or curl
to test connectivity:
kubectl exec -it <pod-name> -- ping <target-pod-ip>
πΉ Check CNI plugin logs:
journalctl -u kubelet | grep CNI
πΉ Ensure Network Policies are not blocking traffic:
kubectl get networkpolicy -A
9οΈβ£ What should you do if a node becomes NotReady
?
β Possible Causes & Fixes:
πΉ Check node status:
kubectl get nodes -o wide
πΉ Inspect node logs:
journalctl -u kubelet -f
πΉ Verify disk space:
df -h
πΉ Restart the node or kubelet service:
systemctl restart kubelet
πΉ Check if the node is tainted:
kubectl describe node <node-name> | grep -i taint
π How do you fix a stuck Kubernetes job?
β Possible Causes & Fixes:
πΉ Check job status:
kubectl get jobs -n <namespace>
πΉ Check logs:
kubectl logs job/<job-name> -n <namespace>
πΉ If the job is stuck, delete and recreate it:
kubectl delete job <job-name> -n <namespace>
πΉ Increase backoffLimit
in the job spec to allow retries:
backoffLimit: 5
π― Summary
β
Use kubectl describe
to inspect resources.
β
Check logs with kubectl logs
.
β
Verify network issues with kubectl get svc
& kubectl get endpoints
.
β
Restart kube-proxy
, kubelet
, or CoreDNS
if needed.
β
Monitor events with kubectl get events --sort-by=.metadata.creationTimestamp
.
π Want More Kubernetes Troubleshooting Tips? Let us know! π₯
Kubernetes, Troubleshooting, DevOps, CloudComputing, kube-proxy, Containers, Microservices, K8s, Networking, ClusterManagement, Debugging