This document lists troubleshooting documents for common issues that you might encounter when using Google Kubernetes Engine (GKE). Whether you're diagnosing workload errors like ImagePullBackOff and CrashLoopBackOff, debugging cluster autoscaling behavior, resolving PersistentVolume issues, or troubleshooting node registration problems, the documents listed here can help.
This document is for Admins and architects, Security specialists, Networking specialists, or Storage specialists who troubleshoot GKE configurations. To learn more about GKE roles, see Common GKE user roles and tasks.
Troubleshoot the kubectl command-line tool in GKE, including issues with authentication, authorization. This page also includes advice on how to troubleshoot the Konnectivity proxy to check if it's causing the kubectl logs, attach, exec, or port-forward commands to stop responding.
Troubleshoot GKE Standard node pools, including issues with node pool creation, best-effort provisioning, corrupted instance metadata, and migrating workloads to new node pools.
Troubleshoot issues that occur when adding nodes to your GKE Standard cluster, such as node registration failures and missing prerequisites for successful node registration.
Diagnose and resolve common reasons your cluster isn't removing underutilized nodes. Learn how to check for issues like restrictive PodDisruptionBudgets, Pods with local storage, or specific annotations (for example, "cluster-autoscaler.kubernetes.io/safe-to-evict": "false") that prevent node eviction.
Learn why the cluster autoscaler isn't adding new nodes to meet demand. Check for unschedulable Pods, verify that you haven't hit cluster or node pool size limits, and identify potential resource quota or regional VM availability issues.
Troubleshoot problems with the Horizontal Pod Autoscaler not scaling your application's Pod replicas. Resolve common issues, such as misconfigured HorizontalPodAutoscaler objects or problems with the metrics pipeline.
If your cluster's root Certificate Authority (CA) is expiring soon, learn how to perform a credential rotation to prevent normal cluster operations from being interrupted.
Troubleshoot image pulls. Learn what causes statuses like ImagePullBackOff and ErrImagePull and how to resolve these statuses by fixing common issues like authentication and network connectivity.
Troubleshoot Kubernetes Out of Memory (OOM) events. Identify causes, distinguish event types, and apply effective solutions for both container- and node-level OOM kills.
Troubleshoot and resolve GKE cluster and node upgrade issues, including long or incomplete upgrades, unexpected auto-upgrades, failures, and post-upgrade problems.
Troubleshoot some of the 400, 401, 403, and 404 errors that you might encounter when using GKE. This page also includes information on how to troubleshoot missing edit permissions on account errors.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-10-30 UTC."],[],[]]