-
- Notifications
You must be signed in to change notification settings - Fork 11.2k
[Model] Use the same fused_moe configs for all H200 devices #23642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Use the same fused_moe configs for all H200 devices #23642
Conversation
83dd246 to b988ea8 Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds tuned configurations for the H200_NVL GPU by copying the existing configurations for the H200. While this approach enables support for the new hardware, it introduces a significant amount of file duplication, which can become a maintenance burden. I've left a comment on one of the new files with a suggestion for a more scalable approach by implementing a fallback in the configuration loading logic. This would avoid the need to duplicate configuration files for hardware with similar performance characteristics.
vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200_NVL.json Show resolved Hide resolved
| @simon-mo could you have a look at this? |
| In the same vein, what about the GH200 ? Running So here device_name = NVIDIA_GH200_144G_HBM3e There exist a config for the same E and N for the H200 : vllm/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json at main · vllm-project/vllm · GitHub So maybe this PR could be extended to all device names containing |
Signed-off-by: Roger Meier <r.meier@siemens.com>
b988ea8 to b713937 Compare | @tomschelsen I changed to use the same config for all H200 devices |
Isotr0py left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look reasonable to me!
| Sorry about the late reply, there's GH200 as well which is an H100 chip under the hood. |
| This PR completely breaks the H200 triton kernel config lookup It needs to be |
fix regression from #23642 Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
| fixed: #28349 |
…ject#23642) Signed-off-by: Roger Meier <r.meier@siemens.com>
| And so @simon-mo see my message above how could I get the right config for the device that I have then ? Thanks |
…ject#23642) Signed-off-by: Roger Meier <r.meier@siemens.com>
Purpose
The main difference of H200 and H200 NVL is the interface (SXM5, vs PCIe), so the configs can be the same. This PR adds a copy of the
H200configs forH200_NVL. see https://www.nvidia.com/en-us/data-center/h200/ and https://resources.nvidia.com/en-us-data-center-overview-mc/en-us-data-center-overview/grace-hopper-superchip-datasheet-partner (GH200)Test Plan
run qwen3-coder-30b-a3b-instruct on H200 NVL, no warning such as
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.⚒️ with ❤️ by https://github.com/siemens