vLLM AI inference serving example #566

seans3 · 2025-07-22T06:29:25Z

Example of AI inference serving using vLLM server.
Model is retrieved from Hugging Face.

seans3 · 2025-07-22T06:29:54Z

janetkuo · 2025-07-25T22:23:06Z

ai/vllm-deployment/README.md

+
+## 📚 Table of Contents
+
+- [Prerequisites](#prerequisites)


nit: this link is broken (the section title below has an emoji); same for some other table of contents

janetkuo · 2025-08-02T02:06:04Z

ai/vllm-deployment/README.md

+
+## ⚙️ Prerequisites
+
+- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).


This clarifies that GKE was used for testing and that the example can be adapted to other Kubernetes clusters with NVIDIA GPUs. The node selectors being commented out is helpful.

It would be even better to briefly explain what a user on a different platform might need to adjust (e.g., node labels for GPU type).

Given the direction that the community is moving towards, this example could switch to use DRA, but that can be updated in a follow-up PR.

I've added a section on Cloud Provider Prerequisites (as a sub-section within the Prerequisites section), which specifies what is necessary from the three largest cloud providers. Please let me know what you think.

Thanks for the fix! I find the new section a bit long, so suggest merging into this bullet point so we don't spend too much text talking about cloud configurations.

Suggested change

- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).

- A Kubernetes cluster with access to NVIDIA GPUs. This example was tested on GKE, but can be adapted for other cloud providers like EKS and AKS by ensuring you have a GPU-enabled node pool and have deployed the Nvidia device plugin.

I think this is fixed. Please have a look.

janetkuo · 2025-08-02T02:09:15Z

ai/vllm-deployment/vllm-deployment.yaml

+ spec:
+ containers:
+ - name: inference-server
+ image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250312_0916_RC01


Is there an equivalent official vLLM image available on another more vendor-neutral registry? If using this specific image is necessary (e.g., due to optimizations or pre-configurations crucial for the example), this dependency and the reason for it should be clearly documented in the README.md under "Prerequisites."

Good point. I've updated to the more generic version of this image: vllm/vllm-openai:latest

janetkuo

A few nits and lgtm otherwise

janetkuo · 2025-08-06T21:18:28Z

ai/vllm-deployment/README.md

+
+## ⚙️ Prerequisites
+
+- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).


Thanks for the fix! I find the new section a bit long, so suggest merging into this bullet point so we don't spend too much text talking about cloud configurations.

Suggested change

- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).

- A Kubernetes cluster with access to NVIDIA GPUs. This example was tested on GKE, but can be adapted for other cloud providers like EKS and AKS by ensuring you have a GPU-enabled node pool and have deployed the Nvidia device plugin.

janetkuo · 2025-08-06T21:19:06Z

ai/vllm-deployment/README.md

+- Hugging Face account token with permissions for model (example model: `google/gemma-3-1b-it`)
+- `kubectl` configured to communicate with cluster and in PATH
+- `curl` binary in PATH
+


Suggested change

**Note for GKE users:** To target specific GPU types, you can uncomment the GKE-specific `nodeSelector` in `vllm-deployment.yaml`.

I think this is fixed. Please have a look.

janetkuo · 2025-08-06T21:21:32Z

ai/vllm-deployment/vllm-deployment.yaml

+ spec:
+ containers:
+ - name: inference-server
+ image: vllm/vllm-openai:latest


Please use a version tag instead of latest, ref https://github.com/kubernetes/examples/blob/master/guidelines.md#manifests-and-configuration

I think this is fixed. I tested by pulling the image. Please have a look.

janetkuo · 2025-08-06T23:32:54Z

ai/vllm-deployment/vllm-deployment.yaml

+ spec:
+ containers:
+ - name: inference-server
+ image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271


SHA isn't readable, so suggest adding a comment here for clarity

Suggested change

image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271

# vllm/vllm-openai:v0.10.0

image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271

Good idea--done.

janetkuo

/lgtm

k8s-ci-robot · 2025-08-06T23:39:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janetkuo, seans3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [janetkuo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 22, 2025

k8s-ci-robot requested review from kow3ns and soltysh July 22, 2025 06:29

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 22, 2025

k8s-ci-robot requested a review from janetkuo July 22, 2025 06:29

seans3 force-pushed the vllm-ai-example branch from f6e0129 to d39a998 Compare July 22, 2025 07:11

janetkuo reviewed Jul 25, 2025

View reviewed changes

janetkuo reviewed Aug 2, 2025

View reviewed changes

seans3 force-pushed the vllm-ai-example branch 3 times, most recently from 162dd69 to a60f29d Compare August 6, 2025 19:33

janetkuo reviewed Aug 6, 2025

View reviewed changes

janetkuo approved these changes Aug 6, 2025

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 6, 2025

vLLM AI inference serving example

15d9100

seans3 force-pushed the vllm-ai-example branch from f8c7e84 to 15d9100 Compare August 6, 2025 23:38

janetkuo approved these changes Aug 6, 2025

View reviewed changes

k8s-ci-robot assigned janetkuo Aug 6, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 6, 2025

k8s-ci-robot merged commit 7021833 into kubernetes:master Aug 6, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM AI inference serving example #566

vLLM AI inference serving example #566

seans3 commented Jul 22, 2025

seans3 commented Jul 22, 2025

janetkuo Jul 25, 2025

seans3 Aug 6, 2025

janetkuo Aug 2, 2025

seans3 Aug 6, 2025 •

edited

Loading

janetkuo Aug 6, 2025

seans3 Aug 6, 2025

janetkuo Aug 2, 2025

seans3 Aug 6, 2025

janetkuo left a comment

janetkuo Aug 6, 2025

janetkuo Aug 6, 2025

seans3 Aug 6, 2025

janetkuo Aug 6, 2025

seans3 Aug 6, 2025 •

edited

Loading

janetkuo Aug 6, 2025

seans3 Aug 6, 2025

janetkuo left a comment

k8s-ci-robot commented Aug 6, 2025

Uh oh!

Labels

3 participants


		## ⚙️ Prerequisites

		- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).

	- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).
	- A Kubernetes cluster with access to NVIDIA GPUs. This example was tested on GKE, but can be adapted for other cloud providers like EKS and AKS by ensuring you have a GPU-enabled node pool and have deployed the Nvidia device plugin.



	Note for GKE users: To target specific GPU types, you can uncomment the GKE-specific `nodeSelector` in `vllm-deployment.yaml`.

	image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271
	# vllm/vllm-openai:v0.10.0
	image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271

vLLM AI inference serving example #566

vLLM AI inference serving example #566

Conversation

seans3 commented Jul 22, 2025

seans3 commented Jul 22, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seans3 Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janetkuo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seans3 Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janetkuo left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 6, 2025

Uh oh!

Labels

3 participants

seans3 Aug 6, 2025 •

edited

Loading

seans3 Aug 6, 2025 •

edited

Loading