Skip to content

Conversation

@Jiang-Jia-Jun
Copy link
Collaborator

@Jiang-Jia-Jun Jiang-Jia-Jun commented Dec 17, 2025

Motivation

移除原本对kv cache block数量的限制

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.
Copilot AI review requested due to automatic review settings December 17, 2025 05:57
@CLAassistant
Copy link

CLAassistant commented Dec 17, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Jiang-Jia-Jun
❌ jiangjiajun
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Dec 17, 2025

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the hardcoded 40,000 block limit for KV cache allocation and replaces it with a configurable environment variable FD_MAX_KVCACHE_BLOCKS. This provides more flexibility for different deployment scenarios and hardware configurations.

Key Changes

  • Introduced FD_MAX_KVCACHE_BLOCKS environment variable with default value of -1 (no limit)
  • Updated block allocation logic in worker_process.py and iluvatar_worker.py to use the configurable limit
  • Removed hardcoded 40,000 block restriction that could unnecessarily limit large memory systems

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
fastdeploy/envs.py Added FD_MAX_KVCACHE_BLOCKS environment variable configuration with default value -1
fastdeploy/worker/worker_process.py Replaced hardcoded 40,000 limit with configurable FD_MAX_KVCACHE_BLOCKS check
fastdeploy/worker/iluvatar_worker.py Applied same configurable limit logic as worker_process.py for consistency

Note regarding PR metadata: 根据自定义代码审查规范,PR描述应该更完整地说明:1) 为什么要移除40000的硬编码限制;2) 新环境变量的使用方法(默认值-1表示无限制);3) 具体修改了哪些文件。此外,PR标题中的"[Optim]"标签建议改为项目标签列表中的"[Optimization]"以保持一致性。

num_blocks_local = min(40000, num_blocks_local)
if envs.FD_MAX_KVCACHE_BLOCKS > 0 and num_blocks_local > envs.FD_MAX_KVCACHE_BLOCKS:
logger.info(f"------- Reset num_blocks_local {num_blocks_local} to {envs.FD_MAX_KVCACHE_BLOCKS}")
num_blocks_local = min(envs.FD_MAX_KVCACHE_BLOCKS, num_blocks_local)
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The min() function on line 542 is redundant since the condition on line 540 already ensures that num_blocks_local > envs.FD_MAX_KVCACHE_BLOCKS. You can simplify line 542 to just assign envs.FD_MAX_KVCACHE_BLOCKS directly to num_blocks_local.

Suggested change
num_blocks_local = min(envs.FD_MAX_KVCACHE_BLOCKS, num_blocks_local)
num_blocks_local = envs.FD_MAX_KVCACHE_BLOCKS
Copilot uses AI. Check for mistakes.
Jiang-Jia-Jun and others added 2 commits December 17, 2025 14:40
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants