Skip to content

Conversation

@DrRyanHuang
Copy link
Contributor

@DrRyanHuang DrRyanHuang commented Jun 16, 2025

PR Category

Execute Infrastructure

PR Types

Improvements

Description

在推理场景下,通常只会命中同一个 cache,但目前每次都需要重新执行 guard 检查,这其实没有必要。因此,我们可以启用一种不安全的优化策略:如果同一个 cache 被连续命中超过一定次数(例如 32 次),则可以认为输入数据没有发生变化,直接返回该 cache,跳过 guard 检查,以提升推理效率。

image

这个策略通过环境变量 SOT_UNSAFE_CACHE_FASTPATH 开启

收益:(稳定后)某模型单Token时间从 35.9ms -> 29.8ms

8441801dc95c094dfc213c5f8a814786 5415343087b48a7aa1ce27141ce1caa2

cc @SigureMo
PCard-66972

@SigureMo
Copy link
Member

加个单测

Comment on lines 78 to 85
cache: dict[
types.CodeType, tuple[GuardedFunctions, paddle.framework.core.GuardTree]
]
translate_count: int
code_symbolic_inputs: dict[types.CodeType, dict[str, None | dict[int, int]]]
compile_time_stats: dict[types.CodeType, float]
consecutive_cache_hit_count: dict[types.CodeType, int]
last_cache_index: dict[types.CodeType, int | None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: 太多 code 相关状态,后续单独拆出来一个 class 用来管理这些状态,OpcodeExecutorCache 用来管理 code 到这个状态的映射

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

@SigureMo SigureMo changed the title [SOT][3.13] Add unsafe cache fast path [SOT] Add unsafe cache fast path Jun 17, 2025
@SigureMo SigureMo merged commit ffaab1d into PaddlePaddle:develop Jun 18, 2025
53 of 55 checks passed
@DrRyanHuang DrRyanHuang deleted the acc_lookup branch June 18, 2025 02:14
huangjiyi pushed a commit to huangjiyi/Paddle that referenced this pull request Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants