Skip to content

Conversation

@Coco58323
Copy link
Contributor

@Coco58323 Coco58323 commented Aug 5, 2024

Fixes #32283

This PR avoids redundant computation for some MoE models (mixtral and qwen2_moe).
The current implementation loops all the experts and inevitably loads experts' weight, which brings extra IO costs.
@ArthurZucker

@Coco58323
Copy link
Contributor Author

@ArthurZucker, I encountered a conflict between torch.fx and a dynamic for loop in my implementation. I haven't found a concise solution for this issue yet. Could you help or just give up for this?

@ghost
Copy link

ghost commented Aug 7, 2024

@Coco58323
Copy link
Contributor Author

#30209

Thanks for the info. I am not quite familiar with 'torch.fx'. Seems like there is a trade-off between enabling FX tracing and skipping experts.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#31173 is related as well.
Yeah if fx's failing not 100% sure we want to do this, but we do needs benches of some sort regarding reduced IO, because for qwen with a lot of experts, it can be significatn

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still down to get this merged! WOuld you mind just producing small benches of before / after?

@wwwwwf
Copy link

wwwwwf commented Dec 27, 2024

same problem.

@ArthurZucker
Copy link
Collaborator

Thanks for the PR 🤗

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually would like to get rid of the to_list() that introduce device synch

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM not that bad

@ArthurZucker ArthurZucker merged commit aab0878 into huggingface:main Apr 8, 2025
14 checks passed
cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025
* Skip non-selected experts for mixtral and qwen2_moe * Fix: tensor tolist() * WIP: tokenization test * fix modular source of truth * nits --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* Skip non-selected experts for mixtral and qwen2_moe * Fix: tensor tolist() * WIP: tokenization test * fix modular source of truth * nits --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
akacmazz added a commit to akacmazz/transformers that referenced this pull request Aug 12, 2025
 - Replace data-dependent .nonzero() operation with static expert loop - Resolves GuardOnDataDependentSymNode error during torch.export - Maintains identical functionality while enabling export compatibility - Fixes issue introduced in PR huggingface#32429
akacmazz added a commit to akacmazz/transformers that referenced this pull request Aug 12, 2025
- Replace data-dependent .nonzero() operation with static expert loop - Resolves GuardOnDataDependentSymNode error during torch.export - Maintains identical functionality while enabling export compatibility - Fixes issue introduced in PR huggingface#32429 - Add tests for torch.export compatibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants