Skip to content

Commit f8c7e84

Browse files
committed
addressed feedback
1 parent a60f29d commit f8c7e84

File tree

2 files changed

+3
-14
lines changed

2 files changed

+3
-14
lines changed

ai/vllm-deployment/README.md

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -24,23 +24,12 @@ This example demonstrates how to deploy a server for AI inference using [vLLM](h
2424

2525
## Prerequisites
2626

27-
- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).
27+
- A Kubernetes cluster with access to NVIDIA GPUs. This example was tested on GKE, but can be adapted for other cloud providers like EKS and AKS by ensuring you have a GPU-enabled node pool and have deployed the Nvidia device plugin.
2828
- Hugging Face account token with permissions for model (example model: `google/gemma-3-1b-it`)
2929
- `kubectl` configured to communicate with cluster and in PATH
3030
- `curl` binary in PATH
3131

32-
#### Cloud Provider Prerequisites
33-
34-
##### Google Kubernetes Engine (GKE)
35-
* Uncomment GKE-specific `label`(s) and `nodeSelector`(s) in `vllm-deployment.yaml`
36-
37-
##### Elastic Kubernetes Service (EKS)
38-
* **GPU-Enabled Nodes**: Your cluster must have a node group with GPU-enabled EC2 instances (e.g., instances from the p or g families, like p3.2xlarge or g4dn.xlarge).
39-
* **Nvidia Device Plugin**: You must have the Nvidia device plugin for Kubernetes installed in your cluster. This is typically deployed as a DaemonSet and is responsible for exposing the nvidia.com/gpu resource from the nodes to the Kubernetes scheduler. Without this plugin, Kubernetes won't be aware of the GPUs, and your pods will fail to schedule.
40-
41-
##### Azure Kubernetes Service (AKS)
42-
* **GPU-Enabled Node Pool**: Your cluster must have a node pool that uses GPU-enabled virtual machines. In Azure, these are typically from the NC, ND, or NV series (e.g., Standard_NC6s_v3).
43-
* **Nvidia Device Plugin**: The Nvidia device plugin for Kubernetes must be installed on your cluster. This plugin is usually deployed as a DaemonSet and is responsible for discovering the GPUs on each node and exposing the nvidia.com/gpu resource to the Kubernetes scheduler. Without this plugin, Kubernetes will not be aware of the GPUs, and any pod requesting them will remain in a Pending state.
32+
**Note for GKE users:** To target specific GPU types, you can uncomment the GKE-specific `nodeSelector` in `vllm-deployment.yaml`.
4433

4534
---
4635

ai/vllm-deployment/vllm-deployment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ spec:
1818
spec:
1919
containers:
2020
- name: inference-server
21-
image: vllm/vllm-openai:latest
21+
image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271
2222
resources:
2323
requests:
2424
cpu: "2"

0 commit comments

Comments
 (0)