Anthropic Prompt Caching: Align CONVERSATION_HISTORY with Anthropic's… #4546
+207 −38
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
… incremental caching pattern
This commit updates the CONVERSATION_HISTORY cache strategy to align with Anthropic's official documentation and cookbook examples (https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb) for incremental conversation caching.
Cache breakpoint placement:
Aggregate eligibility:
Anthropic's documentation and cookbook demonstrate incremental caching by placing cache_control on the LAST user message:
This pattern is also shown in their official docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching#large-context-caching-example
Anthropic's caching system uses prefix matching to find the longest matching prefix from the cache. By placing cache_control on the last user message, we enable the following incremental caching pattern:
The cache grows incrementally with each turn, building a larger prefix that can be reused. This is the recommended pattern from Anthropic.
The new implementation considers all message types (user, assistant, tool) within the 20-block lookback window when checking minimum content length. This ensures that:
None. This is an implementation detail of the CONVERSATION_HISTORY strategy. The API surface remains unchanged. Users may observe:
Different cache hit patterns (should be more effective)
Cache metrics may show higher cache read tokens as conversations grow
Updated
shouldRespectMinLengthForUserHistoryCaching()
to test aggregate eligibility with combined message lengthsRenamed
shouldApplyCacheControlToLastUserMessageForConversationHistory()
(fromshouldRespectAllButLastUserMessageForUserHistoryCaching
)Added
shouldDemonstrateIncrementalCachingAcrossMultipleTurns()
integration test showing cache growth pattern across 4 conversation turnsUpdated mock test assertions to verify last message has cache_control
Updated anthropic-chat.adoc to clarify:
CONVERSATION_HISTORY strategy description now mentions incremental prefix caching
Code example comments updated to reflect cache breakpoint on last user message
Implementation Details section expanded with explanation of prefix matching and aggregate eligibility checking
Anthropic Prompt Caching Docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching
Anthropic Cookbook: https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb
Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:
git commit -s
) per the DCOmain
branch and squash your commitsFor more details, please check the contributor guide.
Thank you upfront!