Skip to content

Conversation

@princepride
Copy link
Contributor

@princepride princepride commented Aug 14, 2025

Purpose

This PR introduces support for the mBART model. This is a necessary prerequisite for the implementation of the Dolphin model, which is tracked in issue #18850, as Dolphin requires mBART for its decoder.

The main difference from the existing BART model is that mBART adjusts the position of the LayerNorm layer within its architecture. The encoder_decoder.py example has also been updated to handle both BART and the new mBART model, ensuring backward compatibility.

Test Plan

The implementation was verified using the offline inference example for both the new mBART model and the existing BART model.

  1. Test mBART support:
    python examples/offline_inference/encoder_decoder.py --model mbart
  2. Test BART backward compatibility:
    python examples/offline_inference/encoder_decoder.py --model bart

Test Result

Both test commands executed successfully and produced the expected outputs, confirming that the mBART implementation is correct and that backward compatibility with BART is maintained.

mBART Test Result:

Screenshot 2025-08-14 at 4 02 15 PM

BART Test Result:

Screenshot 2025-08-14 at 4 00 59 PM
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@princepride princepride requested a review from hmellor as a code owner August 14, 2025 08:04
@mergify mergify bot added documentation Improvements or additions to documentation new-model Requests to new models labels Aug 14, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the MBart model, which is a prerequisite for the Dolphin model. The changes include adding the MBart model implementation, updating the model registry, and refactoring the encoder-decoder example script to support both BART and MBart. The implementation correctly captures the architectural differences between BART and MBart, specifically the pre-norm layer normalization.

My review focuses on the new MBartForConditionalGeneration implementation. I found a critical issue in the weight loading logic that could lead to incorrect model loading due to its reliance on the order of weights in the checkpoint. I've provided a suggestion to make it more robust.

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@princepride princepride changed the title [New Model]Mbart model [New Model]mBART model Aug 14, 2025
Copy link
Collaborator

@noooop noooop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please Add this model to tests/models/registry.py

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Copy link
Collaborator

@noooop noooop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@princepride
Copy link
Contributor Author

cc @Isotr0py Can you help me review it, thank you. 😊

Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, will take a deeper look later tonight.

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@princepride
Copy link
Contributor Author

@Isotr0py @DarkLight1337 hi, can you help review it, thank you.

@princepride
Copy link
Contributor Author

@DarkLight1337 Can you review it again? thank you.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you add a test to check the correctness against HF?

@princepride
Copy link
Contributor Author

@DarkLight1337

Test Code :

import torch from transformers import MBartForConditionalGeneration, MBart50TokenizerFast from vllm import LLM, SamplingParams from vllm.inputs import ExplicitEncoderDecoderPrompt, TokensPrompt # --- 1. General Setup --- MODEL_ID = "facebook/mbart-large-en-ro" # List of articles to test for translation TEST_ARTICLES = [ "UN Chief Says There Is No Military Solution in Syria", "Engineers are building a new bridge across the river.", "Data science is a multidisciplinary field that uses scientific methods.", ] SAMPLING_PARAMS = SamplingParams(temperature=0, max_tokens=64) results = [] # To store the outputs for final comparison print("--- Initializing Models and Tokenizers ---") # --- 2. Initialize Hugging Face Model --- print("Loading Hugging Face model...") # Using MBart50TokenizerFast is more specific for this model family hf_tokenizer = MBart50TokenizerFast.from_pretrained(MODEL_ID, src_lang="en_XX") hf_model = MBartForConditionalGeneration.from_pretrained( MODEL_ID, torch_dtype=torch.bfloat16, device_map="auto" ) print("Hugging Face model loaded successfully.") # --- 3. Initialize vLLM Engine --- print("Loading vLLM engine...") # The hf_overrides parameter can help vLLM resolve the correct architecture if the config is ambiguous llm = LLM(model=MODEL_ID, dtype="bfloat16", hf_overrides={"architectures": ["MBartForConditionalGeneration"]}) print("vLLM engine loaded successfully.\n") # --- 4. Process Each Article and Generate Outputs --- for i, article in enumerate(TEST_ARTICLES): print(f"--- Processing Test Case {i+1}/{len(TEST_ARTICLES)} ---") print(f"Input: {article}") # --- A. Hugging Face Inference --- inputs = hf_tokenizer(article, return_tensors="pt").to(hf_model.device) # The key step: specifying the target language for the decoder translated_tokens = hf_model.generate( **inputs, decoder_start_token_id=hf_tokenizer.lang_code_to_id["ro_RO"], max_new_tokens=64 ) hf_output = hf_tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0] print(f"HF Output: {hf_output}") # --- B. vLLM Inference --- encoder_prompt = article # The equivalent step in vLLM: providing the target language token as the decoder's prompt decoder_prompt_token_ids = [hf_tokenizer.lang_code_to_id["ro_RO"]] vllm_prompt = ExplicitEncoderDecoderPrompt( encoder_prompt=encoder_prompt, decoder_prompt=TokensPrompt(prompt_token_ids=decoder_prompt_token_ids) ) outputs = llm.generate([vllm_prompt], SAMPLING_PARAMS) vllm_output = outputs[0].outputs[0].text print(f"vLLM Output: {vllm_output}\n") # --- C. Store results for the final summary --- results.append({ "original": article, "hf_output": hf_output, "vllm_output": vllm_output }) # --- 5. Final Comparison Table --- print("\n" + "="*80) print(" " * 28 + "FINAL COMPARISON SUMMARY") print("="*80 + "\n") for i, res in enumerate(results): print(f"--- Test Case {i+1} ---") print(f"Original Text: {res['original']}") print(f"HF Output: {res['hf_output']}") print(f"vLLM Output: {res['vllm_output']}") # Verify if the outputs are identical is_match = res['hf_output'] == res['vllm_output'] print(f"Outputs Match: {is_match}") print("-" * 50) 

Result :

image
@noooop
Copy link
Collaborator

noooop commented Aug 15, 2025

Please refer to https://github.com/vllm-project/vllm/blob/main/tests/models/language/generation/test_bart.py

The purpose of this test is to prevent someone from accidentally breaking mBART.

@DarkLight1337
Copy link
Member

Yeah, make sure the test is run in CI

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@princepride princepride requested a review from noooop August 16, 2025 04:47
@princepride
Copy link
Contributor Author

@DarkLight1337 @noooop I have add CI of mbart based on test_bart.py, please review it, thank you.
image

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 16, 2025 06:26
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 16, 2025
@princepride
Copy link
Contributor Author

@DarkLight1337 The failed checks seem to be unrelated to my code.

@DarkLight1337 DarkLight1337 merged commit 829bbd7 into vllm-project:main Aug 16, 2025
43 checks passed
666even666 pushed a commit to 666even666/vllm that referenced this pull request Aug 18, 2025
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>
divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed

4 participants