Skip to content

Conversation

@akakakakakaa
Copy link

@akakakakakaa akakakakakaa commented Dec 17, 2025

When using pooling models together with LMCache, the system crashes due to a missing guard in the LMCache implementation: it does not check whether sampling_params is None. Pooling models do not use sampling parameters, so this leads to a runtime error. (LMCache implementation, vLLM native implementation (maybe works correctly))

To work around this, I attempted to use vLLM’s native LMCache implementation, which already handles this case correctly. However, there is a configuration issue that prevents enabling the native implementation because of overriding.

So, the fix changes the behavior from overriding kv_connector_extra_config to updating it, ensuring that user-provided options such as use_native are preserved.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a configuration override issue for the LMCache connector. The change modifies the population of kv_connector_extra_config from a direct assignment to an update, which prevents user-provided settings from being discarded. This is a necessary bug fix that enables users to customize the LMCache connector's behavior, such as enabling the native implementation. The implementation is clean, correct, and consistent with how other backends are configured in the same file. I approve this change.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify
Copy link

mergify bot commented Dec 17, 2025

Hi @akakakakakaa, the pre-commit checks have failed. Please run:

uv pip install pre-commit pre-commit install pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed) pre-commit run --hook-stage manual mypy-3.10 # For markdownlint pre-commit run --hook-stage manual markdownlint
Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 17, 2025
Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge from main and fix the DCO issue

@mergify
Copy link

mergify bot commented Dec 20, 2025

@mergify mergify bot added documentation Improvements or additions to documentation ci/build deepseek Related to DeepSeek models frontend multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues qwen Related to Qwen models nvidia rocm Related to AMD ROCm labels Dec 20, 2025
@mergify mergify bot added the cpu Related to CPU backends label Dec 20, 2025
@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Dec 20, 2025
@mergify
Copy link

mergify bot commented Dec 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @akakakakakaa.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@akakakakakaa
Copy link
Author

Sorry for my mistake. I'll fix it

@mergify mergify bot removed the needs-rebase label Dec 20, 2025
Signed-off-by: akakakakakaa <akstn3023@naver.com>
Signed-off-by: akakakakakaa <akstn3023@naver.com>
@akakakakakaa akakakakakaa force-pushed the mansu/fix-lmcache-kv-connector-extra-config branch from 45565ea to 8ecbe3a Compare December 20, 2025 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend kv-connector multi-modality Related to multi-modality (#4194) new-model Requests to new models nvidia performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding structured-output tool-calling v1

2 participants