Skip non-selected experts for mixtral and qwen2_moe #32429

Coco58323 · 2024-08-05T13:26:41Z

This PR avoids redundant computation for some MoE models (mixtral and qwen2_moe).
The current implementation loops all the experts and inevitably loads experts' weight, which brings extra IO costs.
@ArthurZucker

Coco58323 · 2024-08-06T06:21:02Z

@ArthurZucker, I encountered a conflict between torch.fx and a dynamic for loop in my implementation. I haven't found a concise solution for this issue yet. Could you help or just give up for this?

ghost · 2024-08-07T10:58:33Z

#30209

Coco58323 · 2024-08-07T14:08:45Z

#30209

Thanks for the info. I am not quite familiar with 'torch.fx'. Seems like there is a trade-off between enabling FX tracing and skipping experts.

ArthurZucker

#31173 is related as well.
Yeah if fx's failing not 100% sure we want to do this, but we do needs benches of some sort regarding reduced IO, because for qwen with a lot of experts, it can be significatn

ArthurZucker

Still down to get this merged! WOuld you mind just producing small benches of before / after?

wwwwwf · 2024-12-27T09:24:30Z

same problem.

ArthurZucker · 2025-03-11T09:30:27Z

Thanks for the PR 🤗

ArthurZucker

Actually would like to get rid of the to_list() that introduce device synch

HuggingFaceDocBuilderDev · 2025-03-11T10:42:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

LGTM not that bad

* Skip non-selected experts for mixtral and qwen2_moe * Fix: tensor tolist() * WIP: tokenization test * fix modular source of truth * nits --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

- Replace data-dependent .nonzero() operation with static expert loop - Resolves GuardOnDataDependentSymNode error during torch.export - Maintains identical functionality while enabling export compatibility - Fixes issue introduced in PR huggingface#32429

- Replace data-dependent .nonzero() operation with static expert loop - Resolves GuardOnDataDependentSymNode error during torch.export - Maintains identical functionality while enabling export compatibility - Fixes issue introduced in PR huggingface#32429 - Add tests for torch.export compatibility

Coco58323 added 3 commits August 5, 2024 21:19

Skip non-selected experts for mixtral and qwen2_moe

00e19b4

Fix: tensor tolist()

774f2c8

WIP: tokenization test

7c092a6

ArthurZucker reviewed Aug 8, 2024

View reviewed changes

ArthurZucker reviewed Sep 27, 2024

View reviewed changes

ArthurZucker added 3 commits March 11, 2025 10:09

Merge branch 'main' of github.com:huggingface/transformers into dev

65441ab

fix modular source of truth

0c33374

nits

4d1d371

ArthurZucker reviewed Mar 11, 2025

View reviewed changes

Merge branch 'main' into dev

58827fc

ArthurZucker approved these changes Apr 8, 2025

View reviewed changes

ArthurZucker merged commit aab0878 into huggingface:main Apr 8, 2025
14 checks passed

seven-mile mentioned this pull request May 15, 2025

Skip non-selected experts for qwen3_moe #38133

Merged

nv-guomingz mentioned this pull request Jun 1, 2025

Failed to export PyTorch traced graph of Mixtral-8x7B-Instruct-v0.1 due to the PR #32429 #38518

Open

4 tasks

akacmazz mentioned this pull request Aug 12, 2025

Fix torch.export compatibility for Mixtral MoE models #40114

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip non-selected experts for mixtral and qwen2_moe #32429

Skip non-selected experts for mixtral and qwen2_moe #32429

Uh oh!

Coco58323 commented Aug 5, 2024 •

edited

Loading

Coco58323 commented Aug 6, 2024

ghost commented Aug 7, 2024

Coco58323 commented Aug 7, 2024

ArthurZucker left a comment

ArthurZucker left a comment

wwwwwf commented Dec 27, 2024

ArthurZucker commented Mar 11, 2025

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Mar 11, 2025

ArthurZucker left a comment

Uh oh!

Labels

4 participants

Skip non-selected experts for mixtral and qwen2_moe #32429

Skip non-selected experts for mixtral and qwen2_moe #32429

Uh oh!

Conversation

Coco58323 commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coco58323 commented Aug 6, 2024

ghost commented Aug 7, 2024

Coco58323 commented Aug 7, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

wwwwwf commented Dec 27, 2024

ArthurZucker commented Mar 11, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 11, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

Coco58323 commented Aug 5, 2024 •

edited

Loading