kubernetes
diff --git a/‎ai/vllm-deployment/README.md‎
Lines changed: 2 additions & 13 deletions b/‎ai/vllm-deployment/README.md‎
Lines changed: 2 additions & 13 deletions
diff --git a/‎ai/vllm-deployment/vllm-deployment.yaml‎
Lines changed: 1 addition & 1 deletion b/‎ai/vllm-deployment/vllm-deployment.yaml‎
Lines changed: 1 addition & 1 deletion
@@ -24,23 +24,12 @@ This example demonstrates how to deploy a server for AI inference using [vLLM](h
 
 ## Prerequisites
 
-- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).
+- A Kubernetes cluster with access to NVIDIA GPUs. This example was tested on GKE, but can be adapted for other cloud providers like EKS and AKS by ensuring you have a GPU-enabled node pool and have deployed the Nvidia device plugin.
 - Hugging Face account token with permissions for model (example model: `google/gemma-3-1b-it`)
 - `kubectl` configured to communicate with cluster and in PATH
 - `curl` binary in PATH
 
-#### Cloud Provider Prerequisites
-
-##### Google Kubernetes Engine (GKE)
- * Uncomment GKE-specific `label`(s) and `nodeSelector`(s) in `vllm-deployment.yaml`
-
-##### Elastic Kubernetes Service (EKS)
- * **GPU-Enabled Nodes**: Your cluster must have a node group with GPU-enabled EC2 instances (e.g., instances from the p or g families, like p3.2xlarge or g4dn.xlarge).
- * **Nvidia Device Plugin**: You must have the Nvidia device plugin for Kubernetes installed in your cluster. This is typically deployed as a DaemonSet and is responsible for exposing the nvidia.com/gpu resource from the nodes to the Kubernetes scheduler. Without this plugin, Kubernetes won't be aware of the GPUs, and your pods will fail to schedule.
-
-##### Azure Kubernetes Service (AKS)
- * **GPU-Enabled Node Pool**: Your cluster must have a node pool that uses GPU-enabled virtual machines. In Azure, these are typically from the NC, ND, or NV series (e.g., Standard_NC6s_v3).
- * **Nvidia Device Plugin**: The Nvidia device plugin for Kubernetes must be installed on your cluster. This plugin is usually deployed as a DaemonSet and is responsible for discovering the GPUs on each node and exposing the nvidia.com/gpu resource to the Kubernetes scheduler. Without this plugin, Kubernetes will not be aware of the GPUs, and any pod requesting them will remain in a Pending state.
+**Note for GKE users:** To target specific GPU types, you can uncomment the GKE-specific `nodeSelector` in `vllm-deployment.yaml`.
 
 ---
 
 
@@ -18,7 +18,7 @@ spec:
  spec:
  containers:
  - name: inference-server
- image: vllm/vllm-openai:latest
+ image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271
  resources:
  requests:
  cpu: "2"