
π₯ Advanced Kubernetes Troubleshooting Questions & Solutions
Letβs dive deeper into real-world Kubernetes troubleshooting scenarios with detailed step-by-step solutions. These questions will help you debug cluster issues like a pro! π
1οΈβ£ How do you troubleshoot a pod stuck in Terminating
state?
β Possible Causes & Fixes:
πΉ Check if the pod is stuck due to finalizers:
kubectl get pod <pod-name> -n <namespace> -o json | jq .metadata.finalizers
πΉ Force delete the pod:
kubectl delete pod <pod-name> --grace-period=0 --force -n <namespace>
πΉ Check if the node is unresponsive:
kubectl get nodes
πΉ If the node is down, drain and remove it:
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
kubectl delete node <node-name>
2οΈβ£ How do you debug ErrImagePull
and ImagePullBackOff
issues?
β Possible Causes & Fixes:
πΉ Check pod events for details:
kubectl describe pod <pod-name>
πΉ Ensure the image exists and is accessible:
docker pull <image-name>
πΉ Check for missing authentication (private registry):
imagePullSecrets:
- name: my-docker-secret
πΉ If using a private registry, verify the secret exists:
kubectl get secrets -n <namespace>
πΉ Manually delete and recreate the pod:
kubectl delete pod <pod-name>
3οΈβ£ How do you check why a pod is evicted?
β Possible Causes & Fixes:
πΉ List evicted pods:
kubectl get pods --field-selector=status.phase=Failed
πΉ Check eviction reason:
kubectl describe pod <evicted-pod-name>
πΉ If caused by memory pressure, increase memory limits:
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"
πΉ Manually remove evicted pods:
kubectl delete pod <pod-name>
4οΈβ£ How do you troubleshoot slow pod scheduling?
β Possible Causes & Fixes:
πΉ Check pending pods:
kubectl get pods --field-selector=status.phase=Pending
πΉ Check if the cluster is out of resources:
kubectl describe node <node-name>
πΉ Check pod scheduling events:
kubectl get events --sort-by=.metadata.creationTimestamp
πΉ Ensure node taints/tolerations allow scheduling:
kubectl describe node <node-name> | grep -i taint
πΉ Verify affinity/anti-affinity settings:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "disktype"
operator: In
values:
- ssd
5οΈβ£ How do you fix CrashLoopBackOff
due to liveness probe failures?
β Possible Causes & Fixes:
πΉ Check logs for errors:
kubectl logs <pod-name>
πΉ Check liveness probe configuration:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
πΉ Test if the probe endpoint is accessible from within the pod:
kubectl exec -it <pod-name> -- curl localhost:8080/healthz
πΉ If needed, disable liveness probe temporarily:
livenessProbe: null
6οΈβ£ How do you fix Node Not Ready
issues?
β Possible Causes & Fixes:
πΉ Check node status:
kubectl get nodes
πΉ Inspect Kubelet logs:
journalctl -u kubelet -f
πΉ Restart the Kubelet service:
systemctl restart kubelet
πΉ Check disk space:
df -h
πΉ Verify node taints are not preventing scheduling:
kubectl describe node <node-name> | grep -i taint
7οΈβ£ How do you fix a failing Ingress?
β Possible Causes & Fixes:
πΉ Check Ingress resources:
kubectl get ingress -n <namespace>
πΉ Describe the Ingress to check for errors:
kubectl describe ingress <ingress-name> -n <namespace>
πΉ Ensure the correct backend service exists:
kubectl get svc -n <namespace>
πΉ Check if Ingress Controller is running:
kubectl get pods -n kube-system | grep ingress
πΉ Verify DNS resolution:
nslookup my-app.example.com
8οΈβ£ How do you troubleshoot Kubernetes persistent volume (PV) issues?
β Possible Causes & Fixes:
πΉ Check PV status:
kubectl get pv
πΉ Check Persistent Volume Claim (PVC) status:
kubectl get pvc -n <namespace>
πΉ Describe the PVC for errors:
kubectl describe pvc <pvc-name> -n <namespace>
πΉ Ensure the storage class is available:
kubectl get storageclass
πΉ If using AWS EBS, verify the disk is attached:
aws ec2 describe-volumes --filters Name=tag:KubernetesCluster,Values=my-cluster
9οΈβ£ How do you fix high CPU/memory usage in Kubernetes?
β Possible Causes & Fixes:
πΉ Check pod resource usage:
kubectl top pod -n <namespace>
πΉ Check node resource usage:
kubectl top node
πΉ Increase CPU/memory limits:
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
πΉ If using Horizontal Pod Autoscaler (HPA), scale based on CPU:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
π How do you restart all pods in a namespace?
β Solution:
kubectl delete pods --all -n <namespace>
or
kubectl rollout restart deployment <deployment-name> -n <namespace>
π₯ Summary
β Use kubectl describe
& kubectl logs
for debugging.
β Check node, pod, service, and network issues.
β Restart pods, nodes, or Ingress controllers if necessary.
β Monitor performance using kubectl top
and HPA.
π Want More Kubernetes Troubleshooting Tips? Let us know! π₯
Kubernetes, Troubleshooting, DevOps, CloudComputing, kube-proxy, Containers, Microservices, K8s, Networking, ClusterManagement, Debugging