[Optim] Remove limitation of number of kvcache blocks #5612

Jiang-Jia-Jun · 2025-12-17T05:57:35Z

Motivation

移除原本对kv cache block数量的限制

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

CLAassistant · 2025-12-17T05:57:41Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Jiang-Jia-Jun
❌ jiangjiajun
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2025-12-17T05:57:43Z

Thanks for your contribution!

Copilot

Pull request overview

This PR removes the hardcoded 40,000 block limit for KV cache allocation and replaces it with a configurable environment variable FD_MAX_KVCACHE_BLOCKS. This provides more flexibility for different deployment scenarios and hardware configurations.

Key Changes

Introduced FD_MAX_KVCACHE_BLOCKS environment variable with default value of -1 (no limit)
Updated block allocation logic in worker_process.py and iluvatar_worker.py to use the configurable limit
Removed hardcoded 40,000 block restriction that could unnecessarily limit large memory systems

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
fastdeploy/envs.py	Added `FD_MAX_KVCACHE_BLOCKS` environment variable configuration with default value -1
fastdeploy/worker/worker_process.py	Replaced hardcoded 40,000 limit with configurable `FD_MAX_KVCACHE_BLOCKS` check
fastdeploy/worker/iluvatar_worker.py	Applied same configurable limit logic as worker_process.py for consistency

Note regarding PR metadata: 根据自定义代码审查规范，PR描述应该更完整地说明：1) 为什么要移除40000的硬编码限制；2) 新环境变量的使用方法（默认值-1表示无限制）；3) 具体修改了哪些文件。此外，PR标题中的"[Optim]"标签建议改为项目标签列表中的"[Optimization]"以保持一致性。

fastdeploy/envs.py

Copilot · 2025-12-17T06:01:03Z

fastdeploy/worker/worker_process.py

- num_blocks_local = min(40000, num_blocks_local)
+ if envs.FD_MAX_KVCACHE_BLOCKS > 0 and num_blocks_local > envs.FD_MAX_KVCACHE_BLOCKS:
+ logger.info(f"------- Reset num_blocks_local {num_blocks_local} to {envs.FD_MAX_KVCACHE_BLOCKS}")
+ num_blocks_local = min(envs.FD_MAX_KVCACHE_BLOCKS, num_blocks_local)


The min() function on line 542 is redundant since the condition on line 540 already ensures that num_blocks_local > envs.FD_MAX_KVCACHE_BLOCKS. You can simplify line 542 to just assign envs.FD_MAX_KVCACHE_BLOCKS directly to num_blocks_local.

Suggested change

num_blocks_local = min(envs.FD_MAX_KVCACHE_BLOCKS, num_blocks_local)

num_blocks_local = envs.FD_MAX_KVCACHE_BLOCKS

fastdeploy/worker/iluvatar_worker.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

[Optim] Remove limitation of number of kvcache blocks

adae560

Copilot AI review requested due to automatic review settings December 17, 2025 05:57

Copilot started reviewing on behalf of Jiang-Jia-Jun December 17, 2025 05:58 View session

Copilot AI reviewed Dec 17, 2025

View reviewed changes

Jiang-Jia-Jun and others added 2 commits December 17, 2025 14:40

Update fastdeploy/envs.py

990335c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update fastdeploy/worker/iluvatar_worker.py

a978fb2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Jiang-Jia-Jun requested a review from Copilot December 17, 2025 06:40

Copilot started reviewing on behalf of Jiang-Jia-Jun December 17, 2025 06:41 View session

Copilot AI reviewed Dec 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Optim] Remove limitation of number of kvcache blocks #5612

[Optim] Remove limitation of number of kvcache blocks #5612

Uh oh!

Jiang-Jia-Jun commented Dec 17, 2025 •

edited

Loading

CLAassistant commented Dec 17, 2025 •

edited

Loading

paddle-bot bot commented Dec 17, 2025

Copilot AI left a comment

Uh oh!

Copilot AI Dec 17, 2025

Uh oh!

Copilot AI left a comment

Labels

4 participants

	num_blocks_local = min(envs.FD_MAX_KVCACHE_BLOCKS, num_blocks_local)
	num_blocks_local = envs.FD_MAX_KVCACHE_BLOCKS

[Optim] Remove limitation of number of kvcache blocks #5612

Are you sure you want to change the base?

[Optim] Remove limitation of number of kvcache blocks #5612

Uh oh!

Conversation

Jiang-Jia-Jun commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

CLAassistant commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

paddle-bot bot commented Dec 17, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Labels

4 participants

Jiang-Jia-Jun commented Dec 17, 2025 •

edited

Loading

CLAassistant commented Dec 17, 2025 •

edited

Loading