[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model #27474

zejunchen-zejun · 2025-10-24T11:51:37Z

doing view for moe weight to torch.float4_e2m1fn_x2 for aiter FP4 fused moe kernel
with this PR, aiter cannot find the suitable kernel for the weight, whose type is uint8

with this PR, the deepseek fp4 can run successfully

gemini-code-assist

Code Review

This pull request correctly addresses a dtype mismatch issue for the aiter fused MoE kernel on ROCm when using FP4 models. The change involves viewing the uint8 weight tensor as torch.float4_e2m1fn_x2 before passing it to the kernel. My review identifies a performance improvement opportunity by moving the hasattr check out of the apply method's hot path and into the class initializer. This would prevent redundant checks on every forward pass.

gemini-code-assist · 2025-10-24T11:53:04Z

vllm/model_executor/layers/quantization/quark/quark_moe.py

+ if hasattr(torch, "float4_e2m1fn_x2"):
+ w13_weight = layer.w13_weight.view(torch.float4_e2m1fn_x2)
+ w2_weight = layer.w2_weight.view(torch.float4_e2m1fn_x2)
+ else:
+ w13_weight = layer.w13_weight
+ w2_weight = layer.w2_weight


The hasattr(torch, "float4_e2m1fn_x2") check is performed on every forward pass within the apply method. This is inefficient as the result of this check is constant for a given PyTorch environment. This check should be performed only once during initialization to avoid repeated overhead in the critical path of inference.

I recommend moving this check to the __init__ method of the QuarkOCP_MX_MoEMethod class and caching the result in an instance variable. For example:

In __init__:

self.fp4_dtype = getattr(torch, "float4_e2m1fn_x2", None)

Then, here in apply, you can simplify the logic:

w13_weight = layer.w13_weight w2_weight = layer.w2_weight if self.fp4_dtype: w13_weight = w13_weight.view(self.fp4_dtype) w2_weight = w2_weight.view(self.fp4_dtype)

This would be more performant and also makes the code in apply cleaner by removing the if/else block. Since modifying __init__ is outside the current diff, I am not providing a direct code suggestion, but this change is highly recommended for performance.

Per ai suggestion, this viewing can be moved to process_weights_after_loading to avoid being invoked every forward.

zejunchen-zejun · 2025-10-24T11:57:06Z

Hi, @maleksan85 @HaiShaw

Could you help review the code changes here?

Thank you.

BowenBao

Looks good with suggestions.

BowenBao · 2025-10-28T16:12:05Z

vllm/model_executor/layers/quantization/quark/quark_moe.py

+ if hasattr(torch, "float4_e2m1fn_x2"):
+ w13_weight = layer.w13_weight.view(torch.float4_e2m1fn_x2)
+ w2_weight = layer.w2_weight.view(torch.float4_e2m1fn_x2)
+ else:
+ w13_weight = layer.w13_weight
+ w2_weight = layer.w2_weight


Per ai suggestion, this viewing can be moved to process_weights_after_loading to avoid being invoked every forward.

zejunchen-zejun · 2025-10-30T07:10:40Z

Hi, @BowenBao
Thank you for review. Let me modify the code here.

zejunchen-zejun · 2025-10-30T07:12:17Z

Hi, @HaiShaw

Could you help review this PR? It fixed the FP4 fused MOE functionality issue.

Thank you!

BowenBao · 2025-10-30T18:34:17Z

just hold on merging.. I believe @zejunchen-zejun has yet pushed the changes.

zejunchen-zejun · 2025-11-03T06:35:02Z

Hi, @BowenBao @gshtras
Thank you for review. We have updated the code according to the significant comments. Could you help review?
With this PR, the DS FP4 functionality is ok:

Thank you

when running aiter fused moe for fp4 model Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

BowenBao · 2025-11-03T18:26:24Z

Thanks @zejunchen-zejun , LGTM

zejunchen-zejun · 2025-11-08T05:45:35Z

Hi, @HaiShaw @SageMoore @BowenBao @gshtras
Could you help merge this PR? Thank you!

zejunchen-zejun · 2025-11-10T07:16:57Z

Hi, @HaiShaw @SageMoore @BowenBao @gshtras @LucasWilkinson
Could you help merge this PR? Thank you!

zejunchen-zejun requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 24, 2025 11:51

mergify bot added the rocm Related to AMD ROCm label Oct 24, 2025

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

zejunchen-zejun mentioned this pull request Oct 24, 2025

[Performance]: Deepseek-V3 Performance Uplift Plan on ROCm Backend #26768

Open

30 tasks

BowenBao approved these changes Oct 28, 2025

View reviewed changes

gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025

gshtras approved these changes Oct 30, 2025

View reviewed changes

zejunchen-zejun force-pushed the zejun/fix_fp4_fused_moe_func_issue_for_rocm branch 3 times, most recently from 1c21289 to 5ed92fd Compare November 3, 2025 06:08

[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2

15df282

when running aiter fused moe for fp4 model Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

zejunchen-zejun force-pushed the zejun/fix_fp4_fused_moe_func_issue_for_rocm branch from 5ed92fd to 15df282 Compare November 3, 2025 07:47

SageMoore approved these changes Nov 4, 2025

View reviewed changes

zejunchen-zejun added 2 commits November 5, 2025 19:59

Merge branch 'main' into zejun/fix_fp4_fused_moe_func_issue_for_rocm

174ec4a

Merge branch 'main' into zejun/fix_fp4_fused_moe_func_issue_for_rocm

d3b1ed7

gshtras merged commit b06b947 into vllm-project:main Nov 10, 2025
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model #27474

[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model #27474

Uh oh!

zejunchen-zejun commented Oct 24, 2025 •

edited by github-actions bot

Loading

gemini-code-assist bot left a comment

gemini-code-assist bot Oct 24, 2025

BowenBao Oct 28, 2025

zejunchen-zejun commented Oct 24, 2025

BowenBao left a comment

BowenBao Oct 28, 2025

zejunchen-zejun commented Oct 30, 2025

zejunchen-zejun commented Oct 30, 2025

BowenBao commented Oct 30, 2025

zejunchen-zejun commented Nov 3, 2025

BowenBao commented Nov 3, 2025

zejunchen-zejun commented Nov 8, 2025

zejunchen-zejun commented Nov 10, 2025 •

edited

Loading

Uh oh!

Labels

4 participants

Uh oh!

[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model #27474

[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model #27474

Uh oh!

Conversation

zejunchen-zejun commented Oct 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

BowenBao Oct 28, 2025

Choose a reason for hiding this comment

zejunchen-zejun commented Oct 24, 2025

BowenBao left a comment

Choose a reason for hiding this comment

BowenBao Oct 28, 2025

Choose a reason for hiding this comment

zejunchen-zejun commented Oct 30, 2025

zejunchen-zejun commented Oct 30, 2025

BowenBao commented Oct 30, 2025

zejunchen-zejun commented Nov 3, 2025

BowenBao commented Nov 3, 2025

zejunchen-zejun commented Nov 8, 2025

zejunchen-zejun commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Labels

4 participants

zejunchen-zejun commented Oct 24, 2025 •

edited by github-actions bot

Loading

zejunchen-zejun commented Nov 10, 2025 •

edited

Loading