Skip to content

Conversation

@bufferoverflow
Copy link
Contributor

@bufferoverflow bufferoverflow commented Aug 26, 2025

Purpose

The main difference of H200 and H200 NVL is the interface (SXM5, vs PCIe), so the configs can be the same. This PR adds a copy of the H200 configs for H200_NVL. see https://www.nvidia.com/en-us/data-center/h200/ and https://resources.nvidia.com/en-us-data-center-overview-mc/en-us-data-center-overview/grace-hopper-superchip-datasheet-partner (GH200)

Test Plan

run qwen3-coder-30b-a3b-instruct on H200 NVL, no warning such as

WARNING 2025-08-25 01:54:03,624 [fused_moe.py:727] Using default MoE config. Performance might be sub-optimal! Config file not found at ['/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200_NVL.json'] 

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

⚒️ with ❤️ by https://github.com/siemens

@bufferoverflow bufferoverflow force-pushed the feat/fused_moe-add-H200_NVL branch from 83dd246 to b988ea8 Compare August 26, 2025 09:32
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds tuned configurations for the H200_NVL GPU by copying the existing configurations for the H200. While this approach enables support for the new hardware, it introduces a significant amount of file duplication, which can become a maintenance burden. I've left a comment on one of the new files with a suggestion for a more scalable approach by implementing a fallback in the configuration loading logic. This would avoid the need to duplicate configuration files for hardware with similar performance characteristics.

@bufferoverflow
Copy link
Contributor Author

@simon-mo could you have a look at this?

@tomschelsen
Copy link

In the same vein, what about the GH200 ? Running Qwen/Qwen3-Coder-30B-A3B-Instruct on an Nvidia GH200 NVL2 system, I get the following warning :

WARNING 10-10 07:30:42 [fused_moe.py:798] Using default MoE config. Performance might be sub-optimal! Config file not found at [‘/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_GH200_144G_HBM3e.json’] 

So here device_name = NVIDIA_GH200_144G_HBM3e

There exist a config for the same E and N for the H200 : vllm/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json at main · vllm-project/vllm · GitHub

So maybe this PR could be extended to all device names containing H200, unless there is a reason to differentiate between H200 and GH200 ?

Signed-off-by: Roger Meier <r.meier@siemens.com>
@bufferoverflow bufferoverflow force-pushed the feat/fused_moe-add-H200_NVL branch from b988ea8 to b713937 Compare October 21, 2025 08:44
@bufferoverflow bufferoverflow requested a review from mgoin as a code owner October 21, 2025 08:44
@bufferoverflow bufferoverflow changed the title [Model] Add tuned fused_moe configs for H200_NVL based on H200 config [Model] Use the same fused_moe configs for all H200 devices Oct 21, 2025
@bufferoverflow
Copy link
Contributor Author

@tomschelsen I changed to use the same config for all H200 devices

Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look reasonable to me!

@Isotr0py Isotr0py enabled auto-merge (squash) October 30, 2025 14:06
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025
@Isotr0py Isotr0py merged commit 2918c1b into vllm-project:main Oct 30, 2025
53 checks passed
@simon-mo
Copy link
Collaborator

Sorry about the late reply, there's GH200 as well which is an H100 chip under the hood.

@robertgshaw2-redhat
Copy link
Collaborator

robertgshaw2-redhat commented Nov 8, 2025

This PR completely breaks the H200 triton kernel config lookup

It needs to be device_name=NVIDIA_H200

robertgshaw2-redhat added a commit that referenced this pull request Nov 8, 2025
fix regression from #23642 Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
@robertgshaw2-redhat
Copy link
Collaborator

fixed: #28349

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
@tomschelsen
Copy link

And so @simon-mo see my message above how could I get the right config for the device that I have then ? Thanks

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

6 participants