Skip to content

Conversation

@robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Nov 8, 2025

Purpose

Test Plan

launch:

VLLM_DISABLE_SHARED_EXPERTS_STREAM=1 VLLM_TORCH_PROFILER_DIR=$(pwd)/profiles VLLM_USE_DEEP_GEMM=0 vllm serve {{MODEL}} --tensor-parallel-size 8 --speculative-config '{"num_speculative_tokens": 1, "method":"deepseek_mtp"}' --no-enable-prefix-caching 

Test Result

result:

(Worker_TP3 pid=1190105) INFO 11-08 16:50:53 [fused_moe.py:876] Using configuration from /home/robertgshaw2-redhat/vllm/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json for MoE layer. 

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
fix regression from #23642 Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly updates the device name for H200 detection to NVIDIA_H200, which aligns with the configuration file naming convention. This is a necessary bugfix. I've also identified a potential issue in the detection logic that could misidentify other GPUs (like GH200) as H200, and I've provided a suggestion to make the check more robust.

Comment on lines 822 to 823
if "H200" in device_name:
device_name = "H200"
device_name = "NVIDIA_H200"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current check if "H200" in device_name: is too broad and could incorrectly match other device names that contain "H200", such as "GH200". This could lead to using the wrong configuration for non-H200 devices, potentially causing performance degradation or errors. To make the check more specific to the H200 family, I suggest splitting the device name by underscores and checking for an exact match of "H200".

Suggested change
if "H200" in device_name:
device_name = "H200"
device_name = "NVIDIA_H200"
if "H200" in device_name.split("_"):
device_name = "NVIDIA_H200"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good bot

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
@Isotr0py Isotr0py enabled auto-merge (squash) November 8, 2025 16:59
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 8, 2025
@simon-mo simon-mo added this to the v0.11.1 milestone Nov 8, 2025
@Isotr0py Isotr0py merged commit 26990d2 into main Nov 8, 2025
55 checks passed
@Isotr0py Isotr0py deleted the robertgshaw2-redhat-patch-1 branch November 8, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

4 participants