[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` #15328

Isotr0py · 2025-03-22T09:28:58Z

Signed-off-by: Isotr0py <2037008807@qq.com>

github-actions · 2025-03-22T09:29:08Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py · 2025-03-22T17:39:43Z

@gshtras Can you check if this PR can fix the Mllama FP8 regression issue? I can generate reasonable outputs with neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic, but not amd/Llama-3.2-11B-Vision-Instruct-FP8-KV.

I'm not sure if it's due to hardware difference or something other, because I can't generate reasonable outputs from this checkpoint with an earlier commit (ec79b67) before QKVCrossParallelLinear introduction as well.

Signed-off-by: Isotr0py <2037008807@qq.com>

NickLucche · 2025-03-24T11:03:50Z

Thanks for the fix!

gshtras · 2025-03-24T14:33:28Z

@gshtras Can you check if this PR can fix the Mllama FP8 regression issue? I can generate reasonable outputs with neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic, but not amd/Llama-3.2-11B-Vision-Instruct-FP8-KV.

I'm not sure if it's due to hardware difference or something other, because I can't generate reasonable outputs from this checkpoint with an earlier commit (ec79b67) before QKVCrossParallelLinear introduction as well.

This fixes the weight loading, but there are also runtime issues in attention now, since 77a318b
Not yet sure if it is related, or a new issue

Isotr0py · 2025-03-24T15:06:06Z

there are also runtime issues in attention now, since 77a318b

Hmmm, that's weird. I can run this model without any errors, but the problem is that the output hidden states will contain NaN, causing model outputs all "!".

gshtras · 2025-03-24T15:34:31Z

there are also runtime issues in attention now, since 77a318b

Hmmm, that's weird. I can run this model without any errors, but the problem is that the output hidden states will contain NaN, causing model outputs all "!".

Possibly hardware related, to clarify, I'm checking with V0 on a MI300X machine.
Applying this fix to the ROCm fork solves the issue, so there appears to be an unrelated regression, I think this PR is GTG, thanks

gshtras · 2025-03-31T21:20:01Z

there are also runtime issues in attention now, since 77a318b

Hmmm, that's weird. I can run this model without any errors, but the problem is that the output hidden states will contain NaN, causing model outputs all "!".

So that is due to an issue with the graph mode, which shouldn't be used on ROCm and is addressed in #15413

SageMoore

I generally think this looks fine, but I'd like someone with more experience in this part of the codebase to take a look. @robertgshaw2-redhat @tlrmchlsmth @mgoin

…Linear` (vllm-project#15328) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Yang Wang <elainewy@meta.com>

…Linear` (vllm-project#15328) Signed-off-by: Isotr0py <2037008807@qq.com>

…Linear` (vllm-project#15328) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

fix process weights after loading for qkv-x linear

8ef1ed7

Signed-off-by: Isotr0py <2037008807@qq.com>

fix weight loader

b0b2936

Signed-off-by: Isotr0py <2037008807@qq.com>

fix unquantize

e130220

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py marked this pull request as ready for review March 24, 2025 05:57

Isotr0py requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners March 24, 2025 05:57

gshtras mentioned this pull request Mar 24, 2025

Upstream merge 2025 03 24 ROCm/vllm#489

Merged

SageMoore reviewed Apr 1, 2025

View reviewed changes

gshtras mentioned this pull request Apr 1, 2025

[ROCm][Bugfix][FP8] Setting scales for the unquantized layers of a partially quantized model #15900

Closed

mgoin approved these changes Apr 8, 2025

View reviewed changes

mgoin added ready ONLY add when PR is ready to merge/full CI is needed bug Something isn't working labels Apr 8, 2025

Merge branch 'vllm-project:main' into fix-xqkv-quant

1b4cb24

Isotr0py mentioned this pull request Apr 8, 2025

[Bug]: Problem Load llama3.2-11B-Vision-Instruct-INT4-GPTQ #16254

Closed

1 task

Isotr0py added the force-merge label Apr 8, 2025

vllm-bot merged commit 40b4284 into vllm-project:main Apr 8, 2025
43 of 45 checks passed

Isotr0py deleted the fix-xqkv-quant branch April 8, 2025 17:03

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025

[Bugfix] Handle process_weights_after_loading for `QKVCrossParallel…

fb414f5

…Linear` (vllm-project#15328) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Yang Wang <elainewy@meta.com>

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[Bugfix] Handle process_weights_after_loading for `QKVCrossParallel…

d306058

…Linear` (vllm-project#15328) Signed-off-by: Isotr0py <2037008807@qq.com>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Bugfix] Handle process_weights_after_loading for `QKVCrossParallel…

8b9721f

…Linear` (vllm-project#15328) Signed-off-by: Isotr0py <2037008807@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` #15328

[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` #15328

Uh oh!

Isotr0py commented Mar 22, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 22, 2025

Isotr0py commented Mar 22, 2025

NickLucche commented Mar 24, 2025

gshtras commented Mar 24, 2025

Isotr0py commented Mar 24, 2025

gshtras commented Mar 24, 2025

gshtras commented Mar 31, 2025

SageMoore left a comment

Uh oh!

Labels

6 participants

Uh oh!

[Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear #15328

[Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear #15328

Uh oh!

Conversation

Isotr0py commented Mar 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

github-actions bot commented Mar 22, 2025

Isotr0py commented Mar 22, 2025

NickLucche commented Mar 24, 2025

gshtras commented Mar 24, 2025

Isotr0py commented Mar 24, 2025

gshtras commented Mar 24, 2025

gshtras commented Mar 31, 2025

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

6 participants

[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` #15328

[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` #15328

Isotr0py commented Mar 22, 2025 •

edited by github-actions bot

Loading