-
- Notifications
You must be signed in to change notification settings - Fork 11.2k
[Model] Fix Skywork R1V mlp #26673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Fix Skywork R1V mlp #26673
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix the MLP component of the Skywork R1V model, likely to enable correct quantization. The changes correctly propagate the quant_config and prefix to the ReplicatedLinear layers within the mlp1 module. However, there's a critical issue in how the prefix is constructed at the call site for _init_mlp1, which will lead to incorrect layer prefixes and likely cause quantization to fail. I've provided a suggestion to correct this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| vit_hidden_size * int(1 / self.downsample_ratio) ** 2, | ||
| llm_hidden_size, | ||
| return_bias=False, | ||
| quant_config=quant_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix omits
mlp1 when registering quantized projector
The new _init_mlp1 forwards the outer prefix directly into the ReplicatedLinear layers (prefix=f"{prefix}.1"/.3). Because this module is attached as self.mlp1, the actual state‑dict path for the quantized weights is …mlp1.1 and …mlp1.3. AWQ/GPTQ configs rely on the prefix string to find the precomputed qweight/scales or to decide which modules to skip; registering the layers under prefix.1 means the loader will not match the checkpoint entries (mlp1.1.*) and the quantized projector still fails to load. Wrap the prefix with maybe_prefix(prefix, "mlp1") before appending the numeric suffix so that weight lookup works for quantized Skywork R1V checkpoints.
Useful? React with 👍 / 👎.
DarkLight1337 left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Purpose
FIX #20818
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.