Skip to content

Conversation

@seans3
Copy link
Contributor

@seans3 seans3 commented Jul 22, 2025

  • Example of AI inference serving using vLLM server.
  • Model is retrieved from Hugging Face.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 22, 2025
@k8s-ci-robot k8s-ci-robot requested review from kow3ns and soltysh July 22, 2025 06:29
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 22, 2025
@seans3
Copy link
Contributor Author

seans3 commented Jul 22, 2025

/cc @janetkuo

@k8s-ci-robot k8s-ci-robot requested a review from janetkuo July 22, 2025 06:29
@seans3 seans3 force-pushed the vllm-ai-example branch from f6e0129 to d39a998 Compare July 22, 2025 07:11

## 📚 Table of Contents

- [Prerequisites](#prerequisites)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this link is broken (the section title below has an emoji); same for some other table of contents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


## ⚙️ Prerequisites

- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clarifies that GKE was used for testing and that the example can be adapted to other Kubernetes clusters with NVIDIA GPUs. The node selectors being commented out is helpful.

It would be even better to briefly explain what a user on a different platform might need to adjust (e.g., node labels for GPU type).

Given the direction that the community is moving towards, this example could switch to use DRA, but that can be updated in a follow-up PR.

Copy link
Contributor Author

@seans3 seans3 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a section on Cloud Provider Prerequisites (as a sub-section within the Prerequisites section), which specifies what is necessary from the three largest cloud providers. Please let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! I find the new section a bit long, so suggest merging into this bullet point so we don't spend too much text talking about cloud configurations.

Suggested change
- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).
- A Kubernetes cluster with access to NVIDIA GPUs. This example was tested on GKE, but can be adapted for other cloud providers like EKS and AKS by ensuring you have a GPU-enabled node pool and have deployed the Nvidia device plugin.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fixed. Please have a look.

spec:
containers:
- name: inference-server
image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250312_0916_RC01
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an equivalent official vLLM image available on another more vendor-neutral registry? If using this specific image is necessary (e.g., due to optimizations or pre-configurations crucial for the example), this dependency and the reason for it should be clearly documented in the README.md under "Prerequisites."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've updated to the more generic version of this image: vllm/vllm-openai:latest

@seans3 seans3 force-pushed the vllm-ai-example branch 3 times, most recently from 162dd69 to a60f29d Compare August 6, 2025 19:33
Copy link
Member

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits and lgtm otherwise


## ⚙️ Prerequisites

- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! I find the new section a bit long, so suggest merging into this bullet point so we don't spend too much text talking about cloud configurations.

Suggested change
- Kubernetes cluster which has access to Nvidia GPU's (tested with GKE autopilot cluster v1.32).
- A Kubernetes cluster with access to NVIDIA GPUs. This example was tested on GKE, but can be adapted for other cloud providers like EKS and AKS by ensuring you have a GPU-enabled node pool and have deployed the Nvidia device plugin.
- Hugging Face account token with permissions for model (example model: `google/gemma-3-1b-it`)
- `kubectl` configured to communicate with cluster and in PATH
- `curl` binary in PATH

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note for GKE users:** To target specific GPU types, you can uncomment the GKE-specific `nodeSelector` in `vllm-deployment.yaml`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fixed. Please have a look.

spec:
containers:
- name: inference-server
image: vllm/vllm-openai:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@seans3 seans3 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fixed. I tested by pulling the image. Please have a look.

spec:
containers:
- name: inference-server
image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHA isn't readable, so suggest adding a comment here for clarity

Suggested change
image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271
# vllm/vllm-openai:v0.10.0
image: vllm/vllm-openai@sha256:05a31dc4185b042e91f4d2183689ac8a87bd845713d5c3f987563c5899878271
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea--done.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 6, 2025
Copy link
Member

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janetkuo, seans3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 7021833 into kubernetes:master Aug 6, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

3 participants