[Model] Lfm2Moe #26344

paulpak58 · 2025-10-07T09:33:55Z

Purpose

This PR implements LFM2-8B-A1B, a hybrid Mixture-of-Experts architecture variant of LFM2. The LFM2 family is optimized for on-device inference by combining short‑range, input‑aware gated convolutions with grouped‑query attention (GQA). LFM2‑MoE keeps this fast backbone and introduces sparse MoE feed‑forward networks to add representational capacity without significantly increasing the active compute path.

HF model

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Paul Pak <paulpak58@gmail.com>

gemini-code-assist

Code Review

This pull request introduces support for the Lfm2Moe model, a Mixture-of-Experts variant. The changes are comprehensive, including the model implementation, configuration, and updates to the model registry and documentation. The overall implementation is solid, but I've identified a few critical issues that need to be addressed. These include a potential AttributeError in the model configuration due to a missing attribute, a faulty assertion in the expert parallelism logic that could lead to crashes, and a minor copy-paste error in an error message. Addressing these points will improve the robustness and correctness of the new model support.

gemini-code-assist · 2025-10-07T09:36:03Z

vllm/model_executor/models/lfm2_moe.py

+ num_physical_experts: int,
+ num_local_physical_experts: int,
+ ) -> None:
+ assert self.num_local_physical_experts == num_local_physical_experts


The assertion self.num_local_physical_experts == num_local_physical_experts will fail if expert rebalancing is triggered by the EPLBScheduler. During rebalancing, the number of physical experts can change, leading to a change in num_local_physical_experts. This assertion compares the old value stored in the model with the new value from the scheduler, which will cause a crash. This assertion should be removed to allow for dynamic updates.

Does this model support EPLB?

vllm/transformers_utils/configs/lfm2_moe.py

vllm/model_executor/models/lfm2_moe.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

vllm/transformers_utils/configs/lfm2_moe.py

chatgpt-codex-connector · 2025-10-07T09:37:40Z

vllm/model_executor/models/lfm2_moe.py

+ for layer in self.model.layers:
+ if isinstance(layer.feed_forward, Lfm2MoeSparseMoeBlock):
+ moe = layer.feed_forward
+ moe.n_local_physical_experts = num_local_physical_experts
+ moe.n_physical_experts = num_physical_experts
+ moe.n_redundant_experts = self.num_redundant_experts


update_physical_experts_metadata crashes with PPMissingLayer

In Lfm2MoeForCausalLM.update_physical_experts_metadata, the loop assumes every element in self.model.layers exposes a feed_forward attribute. When pipeline parallelism is enabled, make_layers inserts PPMissingLayer objects for non‑local layers, so this method will raise AttributeError on those placeholders. Adding an isinstance(layer, PPMissingLayer) or hasattr(layer, "feed_forward") guard before accessing layer.feed_forward would avoid the crash.

Useful? React with 👍 / 👎.

tests/models/registry.py

vllm/transformers_utils/configs/__init__.py

vllm/model_executor/models/lfm2_moe.py

tdoublep · 2025-10-07T12:17:42Z

vllm/model_executor/models/lfm2_moe.py

+ num_physical_experts: int,
+ num_local_physical_experts: int,
+ ) -> None:
+ assert self.num_local_physical_experts == num_local_physical_experts


Does this model support EPLB?

Signed-off-by: Paul Pak <paulpak58@gmail.com>

tdoublep

LGTM

Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

paulpak58 added 7 commits October 7, 2025 05:31

[models] Add Lfm2Moe Architecture

1c33901

Signed-off-by: Paul Pak <paulpak58@gmail.com>

[misc] pre-commit

603e2c1

Signed-off-by: Paul Pak <paulpak58@gmail.com>

[models][lfm2_moe] update model with new config keys

570dcc0

Signed-off-by: Paul Pak <paulpak58@gmail.com>

[models][lfm2_moe] address comments

ab03468

Signed-off-by: Paul Pak <paulpak58@gmail.com>

[models][lfm2_moe] remove deprecated SamplingParams metadata

f39c38c

Signed-off-by: Paul Pak <paulpak58@gmail.com>

[misc] pre-commit

5ed4dbf

Signed-off-by: Paul Pak <paulpak58@gmail.com>

[model] lfm2_moe: get_input_embeddings at top level

f66e691

Signed-off-by: Paul Pak <paulpak58@gmail.com>

paulpak58 requested review from DarkLight1337 and ywang96 as code owners October 7, 2025 09:33

mergify bot added documentation Improvements or additions to documentation new-model Requests to new models labels Oct 7, 2025

gemini-code-assist bot reviewed Oct 7, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 7, 2025

View reviewed changes

DarkLight1337 reviewed Oct 7, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Oct 7, 2025

View reviewed changes

vllm/transformers_utils/configs/__init__.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Oct 7, 2025

View reviewed changes

vllm/model_executor/models/lfm2_moe.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Oct 7, 2025

View reviewed changes

vllm/model_executor/models/lfm2_moe.py Outdated Show resolved Hide resolved

tdoublep reviewed Oct 7, 2025

View reviewed changes

[misc] address comments

84f3064

Signed-off-by: Paul Pak <paulpak58@gmail.com>

paulpak58 requested review from DarkLight1337 and tdoublep October 7, 2025 13:13

tdoublep approved these changes Oct 7, 2025

View reviewed changes

DarkLight1337 approved these changes Oct 7, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 7, 2025 13:47

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2025

DarkLight1337 merged commit 320feae into vllm-project:main Oct 7, 2025
55 checks passed

patrickvonplaten pushed a commit to patrickvonplaten/vllm that referenced this pull request Oct 7, 2025

[Model] Lfm2Moe (vllm-project#26344)

de334cc

Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten pushed a commit to patrickvonplaten/vllm that referenced this pull request Oct 7, 2025

[Model] Lfm2Moe (vllm-project#26344)

e40c998

Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025

[Model] Lfm2Moe (vllm-project#26344)

830391a

Signed-off-by: Paul Pak <paulpak58@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Model] Lfm2Moe (vllm-project#26344)

1119ae0

Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

[Model] Lfm2Moe (vllm-project#26344)

556536d

Signed-off-by: Paul Pak <paulpak58@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Lfm2Moe #26344

[Model] Lfm2Moe #26344

Uh oh!

paulpak58 commented Oct 7, 2025 •

edited by github-actions bot

Loading

gemini-code-assist bot left a comment

gemini-code-assist bot Oct 7, 2025

tdoublep Oct 7, 2025

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tdoublep Oct 7, 2025

tdoublep left a comment

Uh oh!

Labels

3 participants

Uh oh!

[Model] Lfm2Moe #26344

[Model] Lfm2Moe #26344

Uh oh!

Conversation

paulpak58 commented Oct 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist bot Oct 7, 2025

Choose a reason for hiding this comment

tdoublep Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tdoublep Oct 7, 2025

Choose a reason for hiding this comment

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

paulpak58 commented Oct 7, 2025 •

edited by github-actions bot

Loading