Posted on Dec 28, 2024 • Originally published at deep75.Medium on Dec 28, 2024

AIOps : Déboguer son cluster Kubernetes en utilisant l’intelligence artificielle générative via…

Les avancées dans l’Intelligence Artificielle Générative et sa mise en oeuvre simplifiée via certains outils, transforment profondément la gestion des clusters Kubernetes via ce concept d’ AIOps (acronyme anglais de Artificial Intelligence for IT Operations) énoncé par Gartner, un processus dans lequel vous utilisez des techniques d’intelligence artificielle (IA) pour maintenir par exemple une infrastructure. L’un des domaines dans lesquels les ingénieurs DevOps et les opérateurs de cluster débutants sont souvent confrontés, est relatif aux défis de l’identification, de la compréhension et de la résolution de problèmes au sein d’un cluster Kubernetes …

Dans cet article, nous allons donc explorer comment configurer et utiliser K8sGPT, un outil open source basé sur l’IA générative, en combinaison avec Ollama et le modèle Falcon3, pour identifier et résoudre les problèmes dans un cluster Kubernetes.

Pour cela, je pars ici d’une instance Ubuntu 24.04 LTS dans DigitalOcean :

En y installant localement le moteur Docker …

(base) root@k8sgpt:~# curl -fsSL https://get.docker.com | sh - # Executing docker install script, commit: 4c94a56999e10efcf48c5b8e3f6afea464f9108e + sh -c apt-get -qq update >/dev/null + sh -c DEBIAN_FRONTEND=noninteractive apt-get -y -qq install ca-certificates curl >/dev/null Scanning processes... Scanning candidates... Scanning linux images... + sh -c install -m 0755 -d /etc/apt/keyrings + sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" -o /etc/apt/keyrings/docker.asc + sh -c chmod a+r /etc/apt/keyrings/docker.asc + sh -c echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable" > /etc/apt/sources.list.d/docker.list + sh -c apt-get -qq update >/dev/null + sh -c DEBIAN_FRONTEND=noninteractive apt-get -y -qq install docker-ce docker-ce-cli containerd.io docker-compose-plugin docker-ce-rootless-extras docker-buildx-plugin >/dev/null Scanning processes... Scanning candidates... Scanning linux images... + sh -c docker version Client: Docker Engine - Community Version: 27.4.1 API version: 1.47 Go version: go1.22.10 Git commit: b9d17ea Built: Tue Dec 17 15:45:46 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 27.4.1 API version: 1.47 (minimum version 1.24) Go version: go1.22.10 Git commit: c710b88 Built: Tue Dec 17 15:45:46 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.24 GitCommit: 88bf19b2105c8b17560993bee28a01ddc2f97182 runc: Version: 1.2.2 GitCommit: v1.2.2-0-g7cb3632 docker-init: Version: 0.19.0 GitCommit: de40ad0 ================================================================================ To run Docker as a non-privileged user, consider setting up the Docker daemon in rootless mode for your user: dockerd-rootless-setuptool.sh install Visit https://docs.docker.com/go/rootless/ to learn about rootless mode. To run the Docker daemon as a fully privileged service, but granting non-root users access, refer to https://docs.docker.com/go/daemon-access/ WARNING: Access to the remote API on a privileged Docker daemon is equivalent to root access on the host. Refer to the 'Docker daemon attack surface' documentation for details: https://docs.docker.com/go/attack-surface/ ================================================================================

Et je suis capable d’y lancer directmeent Ollama via son image officiellle pour exécuter des modèles de langage de grande taille (LLM) localement en y utilisant les CPU premium d’Intel présents dans l’instance Ubuntu :

Ollama is now available as an official Docker image · Ollama Blog

(base) root@k8sgpt:~# docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest Unable to find image 'ollama/ollama:latest' locally latest: Pulling from ollama/ollama 6414378b6477: Pull complete 9423a26b200c: Pull complete 629da9618c4f: Pull complete 00b71e3f044c: Pull complete Digest: sha256:18bfb1d605604fd53dcad20d0556df4c781e560ebebcd923454d627c994a0e37 Status: Downloaded newer image for ollama/ollama:latest 7b09d9fcdacff4319e553c41f741a15266eb5a5ec745959363e7754c53a203ef (base) root@k8sgpt:~# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7b09d9fcdacf ollama/ollama:latest "/bin/ollama serve" About a minute ago Up About a minute 0.0.0.0:11434->11434/tcp, :::11434->11434/tcp ollama (base) root@k8sgpt:~# netstat -tunlp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:11434 0.0.0.0:* LISTEN 46227/docker-proxy tcp 0 0 127.0.0.54:53 0.0.0.0:* LISTEN 744/systemd-resolve tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 744/systemd-resolve tcp6 0 0 :::22 :::* LISTEN 1/init tcp6 0 0 :::11434 :::* LISTEN 46235/docker-proxy udp 0 0 127.0.0.54:53 0.0.0.0:* 744/systemd-resolve udp 0 0 127.0.0.53:53 0.0.0.0:* 744/systemd-resolve

Et en y chargeant le grand modèle Falcon 3 développé par le Technology Innovation Institute (TII) d’Abu Dhab et présent dans la bibliothèque de grands modèles d’Ollama :

(base) root@k8sgpt:~# docker exec -it ollama ollama pull falcon3 pulling manifest pulling 3717a52b7aea... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.6 GB pulling 803b5adc3448... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 218 B pulling 58f83c52a4e3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 13 KB pulling 35e31ed4c388... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 101 B pulling acb75345e14b... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 487 B verifying sha256 digest writing manifest success (base) root@k8sgpt:~# docker exec -it ollama ollama list NAME ID SIZE MODIFIED falcon3:latest 472ea1c89f64 4.6 GB 11 seconds ago

Je peux à cette étape y charger K8sGPT, un outil pour scanner vos clusters kubernetes, diagnostiquer et trier les problèmes en anglais simple. Il dispose d’une expérience SRE codifiée dans ses analyseurs et aide à extraire les informations les plus pertinentes pour les enrichir avec de l’IA générative.

K8sGPT fonctionne en trois étapes :

Extraction : Récupération des détails de configuration de toutes les charges de travail déployées dans le cluster.
Filtration : Un composant appelé « analyzer » filtre les données nécessaires.
Génération : Les données filtrées sont traitées pour générer des insights et des rapports en anglais simple.

K8sGPT

(base) root@k8sgpt:~# curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.48/k8sgpt_amd64.deb % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 34.5M 100 34.5M 0 0 30.5M 0 0:00:01 0:00:01 --:--:-- 30.5M (base) root@k8sgpt:~# dpkg -i k8sgpt_amd64.deb (Reading database ... 74447 files and directories currently installed.) Preparing to unpack k8sgpt_amd64.deb ... Unpacking k8sgpt (0.3.48) over (0.3.48) ... Setting up k8sgpt (0.3.48) ... (base) root@k8sgpt:~# k8sgpt Kubernetes debugging powered by AI Usage: k8sgpt [command] Available Commands: analyze This command will find problems within your Kubernetes cluster auth Authenticate with your chosen backend cache For working with the cache the results of an analysis completion Generate the autocompletion script for the specified shell custom-analyzer Manage a custom analyzer dump Creates a dumpfile for debugging issues with K8sGPT filters Manage filters for analyzing Kubernetes resources generate Generate Key for your chosen backend (opens browser) help Help about any command integration Integrate another tool into K8sGPT serve Runs k8sgpt as a server version Print the version number of k8sgpt Flags: --config string Default config file (/root/.config/k8sgpt/k8sgpt.yaml) -h, --help help for k8sgpt --kubeconfig string Path to a kubeconfig. Only required if out-of-cluster. --kubecontext string Kubernetes context to use. Only required if out-of-cluster. Use "k8sgpt [command] --help" for more information about a command.

La ligne de commande est alors disponible pour y vérifier les fournisseurs disponibles localement dans cette instance :

Installation

 (base) root@k8sgpt:~# k8sgpt auth list Default: > openai Active: Unused: > openai > localai > ollama > azureopenai > cohere > amazonbedrock > amazonsagemaker > google > noopai > huggingface > googlevertexai > oci > ibmwatsonxai

Ollama sera utilisé comme fournisseur de backend IA pour K8sGPT au travers de LocalAI (qui agit comme l’API REST de remplacement compatible avec les spécifications de l’API OpenAI pour une inférence locale).

Voici la commande pour configurer K8sGPT avec Ollama et le modèle Falcon3 :

(base) root@k8sgpt:~# k8sgpt auth add --backend localai --model falcon3 --baseurl http://localhost:11434/v1 localai added to the AI backend provider list

Je lance un cluster Kubernetes managé sur DigitalOcean via DigitalOcean Kubernetes (DOKS) :

DigitalOcean Managed Kubernetes | Starting at $12/mo.

Récupération du fichier Kubeconfig depuis ce cluster pour l’insérer localement sur l’instance Ubuntu pour l’utiliser avec le client Kubectl :

(base) root@k8sgpt:~# curl -LO https://dl.k8s.io/release/v1.32.0/bin/linux/amd64/kubectl && chmod +x ./kubectl && mv kubectl /usr/local/bin/ && kubectl % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 138 100 138 0 0 1000 0 --:--:-- --:--:-- --:--:-- 1007 100 54.6M 100 54.6M 0 0 120M 0 --:--:-- --:--:-- --:--:-- 120M kubectl controls the Kubernetes cluster manager. Find more information at: https://kubernetes.io/docs/reference/kubectl/ Basic Commands (Beginner): create Create a resource from a file or from stdin expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes service run Run a particular image on the cluster set Set specific features on objects Basic Commands (Intermediate): explain Get documentation for a resource get Display one or many resources edit Edit a resource on the server delete Delete resources by file names, stdin, resources and names, or by resources and label selector Deploy Commands: rollout Manage the rollout of a resource scale Set a new size for a deployment, replica set, or replication controller autoscale Auto-scale a deployment, replica set, stateful set, or replication controller Cluster Management Commands: certificate Modify certificate resources cluster-info Display cluster information top Display resource (CPU/memory) usage cordon Mark node as unschedulable uncordon Mark node as schedulable drain Drain node in preparation for maintenance taint Update the taints on one or more nodes Troubleshooting and Debugging Commands: describe Show details of a specific resource or group of resources logs Print the logs for a container in a pod attach Attach to a running container exec Execute a command in a container port-forward Forward one or more local ports to a pod proxy Run a proxy to the Kubernetes API server cp Copy files and directories to and from containers auth Inspect authorization debug Create debugging sessions for troubleshooting workloads and nodes events List events Advanced Commands: diff Diff the live version against a would-be applied version apply Apply a configuration to a resource by file name or stdin patch Update fields of a resource replace Replace a resource by file name or stdin wait Experimental: Wait for a specific condition on one or many resources kustomize Build a kustomization target from a directory or URL Settings Commands: label Update the labels on a resource annotate Update the annotations on a resource completion Output shell completion code for the specified shell (bash, zsh, fish, or powershell) Subcommands provided by plugins: Other Commands: api-resources Print the supported API resources on the server api-versions Print the supported API versions on the server, in the form of "group/version" config Modify kubeconfig files plugin Provides utilities for interacting with plugins version Print the client and server version information Usage: kubectl [flags] [options] Use "kubectl <command> --help" for more information about a given command. Use "kubectl options" for a list of global command-line options (applies to all commands). (base) root@k8sgpt:~# kubectl cluster-info Kubernetes control plane is running at https://738af175-32d4-43e9-9e31-b7ae3058be3e.k8s.ondigitalocean.com CoreDNS is running at https://738af175-32d4-43e9-9e31-b7ae3058be3e.k8s.ondigitalocean.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. (base) root@k8sgpt:~# kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME pool-kuaxj3k47-ejvl3 Ready <none> 6m40s v1.31.1 10.110.0.3 159.65.200.52 Debian GNU/Linux 12 (bookworm) 6.1.0-27-amd64 containerd://1.6.31 pool-kuaxj3k47-ejvl8 Ready <none> 6m38s v1.31.1 10.110.0.2 164.92.147.176 Debian GNU/Linux 12 (bookworm) 6.1.0-27-amd64 containerd://1.6.31 (base) root@k8sgpt:~# kubectl get po,svc -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system pod/cilium-4gxht 1/1 Running 0 6m46s kube-system pod/cilium-lw8gd 1/1 Running 0 6m48s kube-system pod/coredns-c5c6457c-bnzfc 0/1 Running 0 36s kube-system pod/coredns-c5c6457c-nz6gr 0/1 Running 0 36s kube-system pod/cpc-bridge-proxy-ebpf-7ncbq 1/1 Running 0 55s kube-system pod/cpc-bridge-proxy-ebpf-qth8w 1/1 Running 0 55s kube-system pod/hubble-relay-67597fb8-kmlw5 1/1 Running 1 (51s ago) 8m40s kube-system pod/hubble-ui-79957d9f7b-4n9kj 2/2 Running 0 74s kube-system pod/konnectivity-agent-7ml7p 1/1 Running 0 61s kube-system pod/konnectivity-agent-tnf8j 1/1 Running 0 61s kube-system pod/kube-proxy-ebpf-4gt2z 1/1 Running 0 6m48s kube-system pod/kube-proxy-ebpf-ztjql 1/1 Running 0 6m46s NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default service/kubernetes ClusterIP 10.108.32.1 <none> 443/TCP 9m54s kube-system service/hubble-peer ClusterIP 10.108.54.62 <none> 443/TCP 8m40s kube-system service/hubble-relay ClusterIP 10.108.41.20 <none> 80/TCP 8m40s kube-system service/hubble-ui ClusterIP 10.108.52.164 <none> 80/TCP 8m40s kube-system service/kube-dns ClusterIP 10.108.32.10 <none> 53/UDP,53/TCP,9153/TCP 36s

Installation d’Headlamp, une interface web qui remplace le dashboard traditionnel de Kubernetes, facile à utiliser et extensible.

Headlamp a été créé pour combiner les fonctionnalités traditionnelles des autres interfaces web et tableaux de bord (c’est-à-dire pour lister et visualiser les ressources) avec des fonctionnalités supplémentaires.

(base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/kinvolk/headlamp/main/kubernetes-headlamp.yaml service/headlamp created deployment.apps/headlamp created secret/headlamp-admin created (base) root@k8sgpt:~# kubectl get po,svc -n kube-system NAME READY STATUS RESTARTS AGE pod/cilium-4gxht 1/1 Running 0 11m pod/cilium-lw8gd 1/1 Running 0 11m pod/coredns-c5c6457c-bnzfc 1/1 Running 0 5m7s pod/coredns-c5c6457c-nz6gr 1/1 Running 0 5m7s pod/cpc-bridge-proxy-ebpf-7ncbq 1/1 Running 0 5m26s pod/cpc-bridge-proxy-ebpf-qth8w 1/1 Running 0 5m26s pod/csi-do-node-gxlmb 2/2 Running 0 4m24s pod/csi-do-node-swqfv 2/2 Running 0 4m24s pod/do-node-agent-7bgsh 1/1 Running 0 4m11s pod/do-node-agent-hwt6l 1/1 Running 0 4m11s pod/headlamp-7dfd97b98b-wmn66 1/1 Running 0 48s pod/hubble-relay-67597fb8-kmlw5 1/1 Running 1 (5m22s ago) 13m pod/hubble-ui-79957d9f7b-4n9kj 2/2 Running 0 5m45s pod/konnectivity-agent-7ml7p 1/1 Running 0 5m32s pod/konnectivity-agent-tnf8j 1/1 Running 0 5m32s pod/kube-proxy-ebpf-4gt2z 1/1 Running 0 11m pod/kube-proxy-ebpf-ztjql 1/1 Running 0 11m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/headlamp ClusterIP 10.108.38.247 <none> 80/TCP 48s service/hubble-peer ClusterIP 10.108.54.62 <none> 443/TCP 13m service/hubble-relay ClusterIP 10.108.41.20 <none> 80/TCP 13m service/hubble-ui ClusterIP 10.108.52.164 <none> 80/TCP 13m service/kube-dns ClusterIP 10.108.32.10 <none> 53/UDP,53/TCP,9153/TCP 5m7s

en l’exposant localement et récupérant la clé nécessaire à l’accès à son interface web :

(base) root@k8sgpt:~# nohup kubectl port-forward -n kube-system service/headlamp 8080:80 & (base) root@k8sgpt:~# cat nohup.out Forwarding from 127.0.0.1:8080 -> 4466 Forwarding from [::1]:8080 -> 4466 (base) root@k8sgpt:~# kubectl -n kube-system create serviceaccount headlamp-admin serviceaccount/headlamp-admin created (base) root@k8sgpt:~# kubectl create clusterrolebinding headlamp-admin --serviceaccount=kube-system:headlamp-admin --clusterrole=cluster-admin clusterrolebinding.rbac.authorization.k8s.io/headlamp-admin created (base) root@k8sgpt:~# kubectl create token headlamp-admin -n kube-system eyJhbGciOiJSUzI1NiIsImtpZCI6Ikd0UHltNHV6c1liSkY0VkhNRWFZMXJKYklqY1R6ckZrMDZQS281dEg3dUUifQ.eyJhdWQiOlsic3lzdGVtOmtvbm5lY3Rpdml0eS1zZXJ2ZXIiXSwiZXhwIjoxNzM1MzkyNDEwLCJpYXQiOjE3MzUzODg4MTAsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiNTVkNjI1ZTItNjA2Yi00MTNhLTk2OTgtODFmYjdjZDU4MWY4Iiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJoZWFkbGFtcC1hZG1pbiIsInVpZCI6IjdmOGQwMTU0LWRiYWQtNGU2MS04NTUzLWU1NWI3ZWU0ZjhlOSJ9fSwibmJmIjoxNzM1Mzg4ODEwLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06aGVhZGxhbXAtYWRtaW4ifQ.jbrtuXS7uMP6HfwR3CbIbnpRTq4CDaacq0okwm_4tvmJNNcExi9-Dti3cGj1J3tteszpxVzurWPhrWgFlL4UkEacY9fD1TRH4GAZDCFldJ_jvyeaclzGeymrjEGAZ9TbBdoyuXtLeIVhApdICF1KNM-s8mfr1oOREDwlR9HzzrhoECozYxVS9uM1WIEZpum4FwMEl6cKPqOyNx1Rn5MtKPcc87JyK0FxuXzg9WC-cPSNOxu_rUFrZYHyrVapCDpl_XLymD3pFUUuB8XPVidVXcVOthH1Djwm8TRE6aAD4XlkHTcyTYchvN_CpOI2JQ6DVY60unSU8nq2pxfqLC6G2Q

Avant de lancer une analyse via K8sGPT, introduction d’un problème dans le cluster Kubernetes pour simuler une situation réelle. Vous pouvez utiliser des exemples de déploiements défectueux disponibles sur des répositories comme celui de Robusta :

GitHub - robusta-dev/kubernetes-demos: YAMLs for creating Kubernetes errors and other scenarios

Exemple avec ce déploiement d’un Pod cassé via :

apiVersion: apps/v1 kind: Deployment metadata: name: payment-processing-worker spec: replicas: 1 selector: matchLabels: app: payment-processing-worker template: metadata: labels: app: payment-processing-worker spec: containers: - name: payment-processing-container image: bash command: ["/bin/sh"] args: ["-c", "if [[-z \"${DEPLOY_ENV}\"]]; then echo Environment variable DEPLOY_ENV is undefined ; else while true; do echo hello; sleep 10;done; fi"] base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml deployment.apps/payment-processing-worker created (base) root@k8sgpt:~# kubectl get po NAME READY STATUS RESTARTS AGE payment-processing-worker-747ccfb9db-dzjqx 0/1 CrashLoopBackOff 1 (11s ago) 17s (base) root@k8sgpt:~# kubectl logs po/payment-processing-worker-747ccfb9db-dzjqx Environment variable DEPLOY_ENV is undefined

Il s’est en effet crashé …

Lancement de l’analyse avec une sortie en JSON :

(base) root@k8sgpt:~# k8sgpt analyze -o json --explain --filter=Pod --backend localai | jq . { "provider": "localai", "errors": null, "status": "ProblemDetected", "problems": 1, "results": [ { "kind": "Pod", "name": "default/payment-processing-worker-747ccfb9db-dzjqx", "error": [ { "Text": "the last termination reason is Completed container=payment-processing-container pod=payment-processing-worker-747ccfb9db-dzjqx", "KubernetesDoc": "", "Sensitive": [] } ], "details": "Error: The pod \"payment-processing-worker-747ccfb9db-dzjqx\" has completed its execution with a \"Completed\" termination reason, indicating the container \"payment-processing-container\" has finished successfully.\n\nSolution: Verify the logs for the container to ensure data integrity, then check related services for expected outcomes; if successful, mark the pod as ready in the cluster.", "parentObject": "Deployment/payment-processing-worker" } ] }

ou en mode texte :

(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (1/1, 7960 it/s) AI Provider: localai 0: Pod default/payment-processing-worker-747ccfb9db-dzjqx(Deployment/payment-processing-worker) - Error: the last termination reason is Completed container=payment-processing-container pod=payment-processing-worker-747ccfb9db-dzjqx Error: The pod "payment-processing-worker-747ccfb9db-dzjqx" has completed its execution with a "Completed" termination reason, indicating the container "payment-processing-container" has finished successfully. Solution: Verify the logs for the container to ensure data integrity, then check related services for expected outcomes; if successful, mark the pod as ready in the cluster.

Comme le montre cette sortie, K8sGPT a identifié et signalé le pod problématique et a également fourni des conseils sur les mesures potentielles à prendre pour comprendre et résoudre le problème.

Autre exemple avec ce Pod Nginx problématique :

apiVersion: v1 kind: Pod metadata: name: inventory-management-api spec: containers: - name: nginx image: nginx ports: - containerPort: 80 command: - wge - "-O" - "/work-dir/index.html" - https://home.robusta.dev (base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/crashloop_backoff/create_crashloop_backoff.yaml pod/inventory-management-api created (base) root@k8sgpt:~# kubectl get po NAME READY STATUS RESTARTS AGE inventory-management-api 0/1 ContainerCreating 0 5s (base) root@k8sgpt:~# kubectl get po NAME READY STATUS RESTARTS AGE inventory-management-api 0/1 RunContainerError 1 (1s ago) 10s

Et une nouvelle analyse met le doigt sur le Pod problématique …

(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (1/1, 2 it/min) AI Provider: localai 0: Pod default/inventory-management-api() - Error: the last termination reason is StartError container=nginx pod=inventory-management-api Error: The Kubernetes error indicates that there was a StartError issue with the nginx container for the pod named inventory-management-api. Solution: 1. Check the nginx configuration file for syntax errors. 2. Ensure all required resources and permissions are correctly set. 3. Verify network accessibility within the pod. 4. Confirm proper image pull secrets if using Docker images. 5. Review any recent changes to the deployment or service configurations.

Dans cet autre exemple il peut ne pas détecter de problématique avec cette simulation de faux positif via busybox :

apiVersion: batch/v1 kind: Job metadata: name: java-api-checker spec: template: spec: containers: - name: java-beans image: busybox command: ["/bin/sh", "-c"] args: ["echo 'Java Network Exception: \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256'; sleep 60; exit 1"] restartPolicy: Never backoffLimit: 1 (base) root@k8sgpt:~# kubectl delete -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/crashloop_backoff/create_crashloop_backoff.yaml pod "inventory-management-api" deleted (base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/job_failure/job_crash.yaml job.batch/java-api-checker created (base) root@k8sgpt:~# kubectl get po NAME READY STATUS RESTARTS AGE java-api-checker-5s6dc 1/1 Running 0 7s (base) root@k8sgpt:~# kubectl logs po/java-api-checker-5s6dc Java Network Exception: All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256

(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc AI Provider: localai No problems detected

Pour une intégration plus complète, vous pouvez installer l’opérateur K8sGPT dans votre cluster Kubernetes. Cet opérateur surveille en continu les problèmes dans le cluster et génère des insights que vous pouvez consulter en interrogeant la ressource personnalisée (CR) de l'opérateur.

GitHub - k8sgpt-ai/k8sgpt-operator: Automatic SRE Superpowers within your Kubernetes cluster

$ helm repo add k8sgpt https://charts.k8sgpt.ai/ $ helm repo update $ helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace # Installer l'opérateur k8sGPT $ kubectl apply -n k8sgpt-operator-system -f - << EOF apiVersion: core.k8sgpt.ai/v1alpha1 kind: K8sGPT metadata: name: k8sgpt-ollama spec: ai: enabled: true model: falcon3 backend: localai baseUrl: http://localhost:11434/v1 noCache: false filters: ["Pod"] repository: ghcr.io/k8sgpt-ai/k8sgpt version: v0.3.48 EOF

Debugging your Rancher Kubernetes Cluster the GenAI Way w...

Il est également possible d’analyser plusieurs clusters Kubernetes en spécifiant le chemin vers le fichier Kubeconfig concerné :

$ k8sgpt analyze --explain --backend localai --with-doc --kubeconfig <chemin vers le fichier Kubeconfig>

L’opérateur recherchera les problèmes dans le cluster et générera des résultats d’analyse. En fonction de la puissance de votre machine (pour accélerer les temps de réponse d’Ollama, des ressources GPU sont nécessaires), il faut un certain temps à l’opérateur pour appeler le LLM et générer les informations …

Pour conclure, K8sGPT en combinaison avec Ollama offre une solution puissante pour déboguer et gérer les clusters Kubernetes de manière efficace. Cette intégration utilise l'intelligence artificielle pour fournir des insights clairs et des recommandations pour résoudre les problèmes, simplifiant ainsi la vie des opérateurs de cluster. En suivant ces étapes, vous pouvez mettre en place une solution de diagnostic automatisée et basée sur l'IA pour votre environnement Kubernetes …

À suivre !

DEV Community

AIOps : Déboguer son cluster Kubernetes en utilisant l’intelligence artificielle générative via…

Top comments (0)