-
- Notifications
You must be signed in to change notification settings - Fork 11k
Enable headless models for pooling in the Transformers backend #21767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable headless models for pooling in the Transformers backend #21767
Conversation
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
| 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables loading headless models for embedding with the Transformers backend, which is a great addition. The changes in the configuration and registry look correct, and the new test case covers the intended scenarios.
However, I've found a critical issue in the WeightsMapper implementation for the new TransformersModel. The current logic for prefixing weights is flawed due to incorrect key ordering in the dictionary, which will cause weight loading to fail for one of the model formats this PR aims to support. I've provided a detailed comment and a code suggestion to fix this.
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for extending this support!
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
| The mapper is having issues, I'll disable auto-merge for now |
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…project#21767) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…project#21767) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…project#21767) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
…project#21767) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…project#21767) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…project#21767) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…project#21767) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Previously, embedding model checkpoints that had their layers at the root of the checkpoint would not load correctly with the Transformers backend.
This PR enables the loading of Transformers base model classes.
Now, both of the following formats of checkpoint will work for pooling tasks:
ModelForCausalLM:Model: