-
- Notifications
You must be signed in to change notification settings - Fork 10.6k
[Model] Lfm2Moe #26344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Lfm2Moe #26344
Conversation
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Lfm2Moe model, a Mixture-of-Experts variant. The changes are comprehensive, including the model implementation, configuration, and updates to the model registry and documentation. The overall implementation is solid, but I've identified a few critical issues that need to be addressed. These include a potential AttributeError
in the model configuration due to a missing attribute, a faulty assertion in the expert parallelism logic that could lead to crashes, and a minor copy-paste error in an error message. Addressing these points will improve the robustness and correctness of the new model support.
num_physical_experts: int, | ||
num_local_physical_experts: int, | ||
) -> None: | ||
assert self.num_local_physical_experts == num_local_physical_experts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion self.num_local_physical_experts == num_local_physical_experts
will fail if expert rebalancing is triggered by the EPLBScheduler
. During rebalancing, the number of physical experts can change, leading to a change in num_local_physical_experts
. This assertion compares the old value stored in the model with the new value from the scheduler, which will cause a crash. This assertion should be removed to allow for dynamic updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this model support EPLB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
for layer in self.model.layers: | ||
if isinstance(layer.feed_forward, Lfm2MoeSparseMoeBlock): | ||
moe = layer.feed_forward | ||
moe.n_local_physical_experts = num_local_physical_experts | ||
moe.n_physical_experts = num_physical_experts | ||
moe.n_redundant_experts = self.num_redundant_experts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_physical_experts_metadata crashes with PPMissingLayer
In Lfm2MoeForCausalLM.update_physical_experts_metadata
, the loop assumes every element in self.model.layers
exposes a feed_forward
attribute. When pipeline parallelism is enabled, make_layers
inserts PPMissingLayer
objects for non‑local layers, so this method will raise AttributeError
on those placeholders. Adding an isinstance(layer, PPMissingLayer)
or hasattr(layer, "feed_forward")
guard before accessing layer.feed_forward
would avoid the crash.
Useful? React with 👍 / 👎.
num_physical_experts: int, | ||
num_local_physical_experts: int, | ||
) -> None: | ||
assert self.num_local_physical_experts == num_local_physical_experts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this model support EPLB?
Signed-off-by: Paul Pak <paulpak58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
Purpose
This PR implements LFM2-8B-A1B, a hybrid Mixture-of-Experts architecture variant of LFM2. The LFM2 family is optimized for on-device inference by combining short‑range, input‑aware gated convolutions with grouped‑query attention (GQA). LFM2‑MoE keeps this fast backbone and introduces sparse MoE feed‑forward networks to add representational capacity without significantly increasing the active compute path.
HF model
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.