Skip to content

Conversation

paulpak58
Copy link
Contributor

@paulpak58 paulpak58 commented Oct 7, 2025

Purpose

This PR implements LFM2-8B-A1B, a hybrid Mixture-of-Experts architecture variant of LFM2. The LFM2 family is optimized for on-device inference by combining short‑range, input‑aware gated convolutions with grouped‑query attention (GQA). LFM2‑MoE keeps this fast backbone and introduces sparse MoE feed‑forward networks to add representational capacity without significantly increasing the active compute path.

HF model

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
@mergify mergify bot added documentation Improvements or additions to documentation new-model Requests to new models labels Oct 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Lfm2Moe model, a Mixture-of-Experts variant. The changes are comprehensive, including the model implementation, configuration, and updates to the model registry and documentation. The overall implementation is solid, but I've identified a few critical issues that need to be addressed. These include a potential AttributeError in the model configuration due to a missing attribute, a faulty assertion in the expert parallelism logic that could lead to crashes, and a minor copy-paste error in an error message. Addressing these points will improve the robustness and correctness of the new model support.

num_physical_experts: int,
num_local_physical_experts: int,
) -> None:
assert self.num_local_physical_experts == num_local_physical_experts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The assertion self.num_local_physical_experts == num_local_physical_experts will fail if expert rebalancing is triggered by the EPLBScheduler. During rebalancing, the number of physical experts can change, leading to a change in num_local_physical_experts. This assertion compares the old value stored in the model with the new value from the scheduler, which will cause a crash. This assertion should be removed to allow for dynamic updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model support EPLB?

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines +731 to +736
for layer in self.model.layers:
if isinstance(layer.feed_forward, Lfm2MoeSparseMoeBlock):
moe = layer.feed_forward
moe.n_local_physical_experts = num_local_physical_experts
moe.n_physical_experts = num_physical_experts
moe.n_redundant_experts = self.num_redundant_experts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge update_physical_experts_metadata crashes with PPMissingLayer

In Lfm2MoeForCausalLM.update_physical_experts_metadata, the loop assumes every element in self.model.layers exposes a feed_forward attribute. When pipeline parallelism is enabled, make_layers inserts PPMissingLayer objects for non‑local layers, so this method will raise AttributeError on those placeholders. Adding an isinstance(layer, PPMissingLayer) or hasattr(layer, "feed_forward") guard before accessing layer.feed_forward would avoid the crash.

Useful? React with 👍 / 👎.

num_physical_experts: int,
num_local_physical_experts: int,
) -> None:
assert self.num_local_physical_experts == num_local_physical_experts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model support EPLB?

Signed-off-by: Paul Pak <paulpak58@gmail.com>
Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 7, 2025 13:47
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2025
@DarkLight1337 DarkLight1337 merged commit 320feae into vllm-project:main Oct 7, 2025
55 checks passed
patrickvonplaten pushed a commit to patrickvonplaten/vllm that referenced this pull request Oct 7, 2025
Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
patrickvonplaten pushed a commit to patrickvonplaten/vllm that referenced this pull request Oct 7, 2025
Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025
Signed-off-by: Paul Pak <paulpak58@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed

3 participants