Swiftorial Logo
Home
Swift Lessons
AI Tools
Learn More
Career
Resources

Kubernetes: Scenario-Based Questions

5. A Kubernetes pod is stuck in a CrashLoopBackOff state. How do you investigate and fix it?

CrashLoopBackOff indicates a pod is failing repeatedly and being restarted by Kubernetes. It’s a symptom of unhandled failures within the containerized application or misconfigurations in the pod definition.

🔍 Initial Investigation Steps

  • Check Pod Events: Use kubectl describe pod <pod-name> to view failure reasons, container statuses, and recent events.
  • Inspect Logs: Run kubectl logs <pod-name> or kubectl logs <pod-name> -c <container> for multi-container pods.
  • Validate Liveness/Readiness Probes: Misconfigured health checks often trigger restarts.
  • Resource Requests: Low memory or CPU allocations may cause OOMKilled restarts.

🛠 Common Causes

  • Application Exit Code ≠ 0: Entry point script or server crashing with an error.
  • Bad Configuration: Environment variable not set, incorrect DB URL, or missing file paths.
  • Probe Misfires: Liveness probe fails due to incorrect endpoint or aggressive timing.
  • Init Container Failures: Pod won’t start if init containers don’t complete.

🧪 Diagnostic Tools

  • kubectl get pods -A: Identify all failing pods across namespaces.
  • kubectl describe pod: Details about restarts, status, events.
  • kubectl logs --previous: Logs from the last failed container state.
  • Enable metrics-server and use Lens or Prometheus dashboards for deeper container-level metrics.

✅ Remediation

  • Fix the actual app issue (e.g., null pointer, port binding, DB unreachable).
  • Update probe paths, delays, and thresholds for graceful startup.
  • Temporarily disable probes using kubectl patch if needed for deeper debugging.
  • Use ephemeral containers or kubectl debug for live diagnosis (K8s 1.18+).

🚫 Mistakes to Avoid

  • Assuming the pod is fine without reading logs.
  • Force-restarting the pod repeatedly without understanding the root cause.
  • Modifying core deployment YAMLs in production without version control.

📌 Real-World Insight

In production clusters, CrashLoopBackOff is often caught via alerting systems. Mature teams pair logs with metrics (e.g., Prometheus + Loki) and standardize health checks to quickly diagnose and minimize downtime.