[Model] Use the same fused_moe configs for all H200 devices #23642

bufferoverflow · 2025-08-26T09:31:37Z

Purpose

The main difference of H200 and H200 NVL is the interface (SXM5, vs PCIe), so the configs can be the same. This PR adds a copy of the H200 configs for H200_NVL. see https://www.nvidia.com/en-us/data-center/h200/ and https://resources.nvidia.com/en-us-data-center-overview-mc/en-us-data-center-overview/grace-hopper-superchip-datasheet-partner (GH200)

Test Plan

run qwen3-coder-30b-a3b-instruct on H200 NVL, no warning such as

WARNING 2025-08-25 01:54:03,624 [fused_moe.py:727] Using default MoE config. Performance might be sub-optimal! Config file not found at ['/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200_NVL.json']

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

⚒️ with ❤️ by https://github.com/siemens

gemini-code-assist

Code Review

This pull request adds tuned configurations for the H200_NVL GPU by copying the existing configurations for the H200. While this approach enables support for the new hardware, it introduces a significant amount of file duplication, which can become a maintenance burden. I've left a comment on one of the new files with a suggestion for a more scalable approach by implementing a fallback in the configuration loading logic. This would avoid the need to duplicate configuration files for hardware with similar performance characteristics.

vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200_NVL.json

bufferoverflow · 2025-09-01T10:00:52Z

@simon-mo could you have a look at this?

tomschelsen · 2025-10-13T12:45:00Z

In the same vein, what about the GH200 ? Running Qwen/Qwen3-Coder-30B-A3B-Instruct on an Nvidia GH200 NVL2 system, I get the following warning :

WARNING 10-10 07:30:42 [fused_moe.py:798] Using default MoE config. Performance might be sub-optimal! Config file not found at [‘/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_GH200_144G_HBM3e.json’]

So here device_name = NVIDIA_GH200_144G_HBM3e

There exist a config for the same E and N for the H200 : vllm/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json at main · vllm-project/vllm · GitHub

So maybe this PR could be extended to all device names containing H200, unless there is a reason to differentiate between H200 and GH200 ?

Signed-off-by: Roger Meier <r.meier@siemens.com>

bufferoverflow · 2025-10-21T08:48:09Z

@tomschelsen I changed to use the same config for all H200 devices

Isotr0py

Look reasonable to me!

simon-mo · 2025-10-30T17:40:49Z

Sorry about the late reply, there's GH200 as well which is an H100 chip under the hood.

robertgshaw2-redhat · 2025-11-08T16:39:53Z

This PR completely breaks the H200 triton kernel config lookup

It needs to be device_name=NVIDIA_H200

fix regression from #23642 Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

robertgshaw2-redhat · 2025-11-08T16:45:25Z

fixed: #28349

…ject#23642) Signed-off-by: Roger Meier <r.meier@siemens.com>

tomschelsen · 2025-11-09T07:26:11Z

And so @simon-mo see my message above how could I get the right config for the device that I have then ? Thanks

…ject#23642) Signed-off-by: Roger Meier <r.meier@siemens.com>

bufferoverflow force-pushed the feat/fused_moe-add-H200_NVL branch from 83dd246 to b988ea8 Compare August 26, 2025 09:32

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200_NVL.json Show resolved Hide resolved

[Model] Use the same fused_moe configs for all H200 devices

b713937

Signed-off-by: Roger Meier <r.meier@siemens.com>

bufferoverflow force-pushed the feat/fused_moe-add-H200_NVL branch from b988ea8 to b713937 Compare October 21, 2025 08:44

bufferoverflow requested a review from mgoin as a code owner October 21, 2025 08:44

bufferoverflow changed the title ~~[Model] Add tuned fused_moe configs for H200_NVL based on H200 config~~ [Model] Use the same fused_moe configs for all H200 devices Oct 21, 2025

Merge branch 'main' into feat/fused_moe-add-H200_NVL

737005b

bufferoverflow requested a review from pavanimajety as a code owner October 30, 2025 03:22

chaunceyjiang assigned chaunceyjiang and Isotr0py Oct 30, 2025

Isotr0py approved these changes Oct 30, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) October 30, 2025 14:06

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025

Isotr0py merged commit 2918c1b into vllm-project:main Oct 30, 2025
53 checks passed

robertgshaw2-redhat added a commit that referenced this pull request Nov 8, 2025

[Bugfix] Update device name for H200 detection

99c5cc8

fix regression from #23642 Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

robertgshaw2-redhat mentioned this pull request Nov 8, 2025

[Bugfix] Update device name for H200 detection #28349

Merged

5 tasks

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[Model] Use the same fused_moe configs for all H200 devices (vllm-pro…

48994f7

…ject#23642) Signed-off-by: Roger Meier <r.meier@siemens.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Model] Use the same fused_moe configs for all H200 devices (vllm-pro…

797bb1f

…ject#23642) Signed-off-by: Roger Meier <r.meier@siemens.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Use the same fused_moe configs for all H200 devices #23642

[Model] Use the same fused_moe configs for all H200 devices #23642

Uh oh!

bufferoverflow commented Aug 26, 2025 •

edited by github-actions bot

Loading

gemini-code-assist bot left a comment

Uh oh!

bufferoverflow commented Sep 1, 2025

tomschelsen commented Oct 13, 2025

bufferoverflow commented Oct 21, 2025

Isotr0py left a comment

Uh oh!

simon-mo commented Oct 30, 2025

robertgshaw2-redhat commented Nov 8, 2025 •

edited

Loading

robertgshaw2-redhat commented Nov 8, 2025

tomschelsen commented Nov 9, 2025

Labels

6 participants

Uh oh!

[Model] Use the same fused_moe configs for all H200 devices #23642

[Model] Use the same fused_moe configs for all H200 devices #23642

Uh oh!

Conversation

bufferoverflow commented Aug 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

bufferoverflow commented Sep 1, 2025

tomschelsen commented Oct 13, 2025

bufferoverflow commented Oct 21, 2025

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

simon-mo commented Oct 30, 2025

robertgshaw2-redhat commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

robertgshaw2-redhat commented Nov 8, 2025

tomschelsen commented Nov 9, 2025

Labels

6 participants

bufferoverflow commented Aug 26, 2025 •

edited by github-actions bot

Loading

robertgshaw2-redhat commented Nov 8, 2025 •

edited

Loading