Anthropic Prompt Caching: Align CONVERSATION_HISTORY with Anthropic's… #4546

sobychacko · 2025-10-03T17:37:53Z

… incremental caching pattern

This commit updates the CONVERSATION_HISTORY cache strategy to align with Anthropic's official documentation and cookbook examples (https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb) for incremental conversation caching.

Cache breakpoint placement:

Before: Cache breakpoint on penultimate (second-to-last) user message
After: Cache breakpoint on last user message

Aggregate eligibility:

Before: Only considered user messages for min content length check
After: Considers all message types (user, assistant, tool) within 20-block lookback window for aggregate eligibility

Anthropic's documentation and cookbook demonstrate incremental caching by placing cache_control on the LAST user message:

result.append({ "role": "user", "content": [{ "type": "text", "text": turn["content"][0]["text"], "cache_control": {"type": "ephemeral"} # On LAST user message }] })

This pattern is also shown in their official docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching#large-context-caching-example

Anthropic's caching system uses prefix matching to find the longest matching prefix from the cache. By placing cache_control on the last user message, we enable the following incremental caching pattern:

Turn 1: Cache [System + User1] Turn 2: Reuse [System + User1], process [Assistant1 + User2], cache [System + User1 + Assistant1 + User2] Turn 3: Reuse [System + User1 + Assistant1 + User2], process [Assistant2 + User3], cache [System + User1 + Assistant1 + User2 + Assistant2 + User3]

The cache grows incrementally with each turn, building a larger prefix that can be reused. This is the recommended pattern from Anthropic.

The new implementation considers all message types (user, assistant, tool) within the 20-block lookback window when checking minimum content length. This ensures that:

Short user questions don't prevent caching when conversation has long assistant responses
The full conversation context is considered for the 1024+ token minimum
Aligns with Anthropic's note: "The automatic prefix checking only looks back approximately 20 content blocks from each explicit breakpoint"

None. This is an implementation detail of the CONVERSATION_HISTORY strategy. The API surface remains unchanged. Users may observe:

Different cache hit patterns (should be more effective)
Cache metrics may show higher cache read tokens as conversations grow
Updated shouldRespectMinLengthForUserHistoryCaching() to test aggregate eligibility with combined message lengths
Renamed shouldApplyCacheControlToLastUserMessageForConversationHistory() (from shouldRespectAllButLastUserMessageForUserHistoryCaching)
Added shouldDemonstrateIncrementalCachingAcrossMultipleTurns() integration test showing cache growth pattern across 4 conversation turns
Updated mock test assertions to verify last message has cache_control

Updated anthropic-chat.adoc to clarify:

CONVERSATION_HISTORY strategy description now mentions incremental prefix caching
Code example comments updated to reflect cache breakpoint on last user message
Implementation Details section expanded with explanation of prefix matching and aggregate eligibility checking
Anthropic Prompt Caching Docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching
Anthropic Cookbook: https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb

Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:

Add a Signed-off-by line to each commit (git commit -s) per the DCO
Rebase your changes on the latest main branch and squash your commits
Add/Update unit tests as needed
Run a build and make sure all tests pass prior to submission

For more details, please check the contributor guide.
Thank you upfront!

… incremental caching pattern This commit updates the CONVERSATION_HISTORY cache strategy to align with Anthropic's official documentation and cookbook examples (https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb) for incremental conversation caching. **Cache breakpoint placement:** - Before: Cache breakpoint on penultimate (second-to-last) user message - After: Cache breakpoint on last user message **Aggregate eligibility:** - Before: Only considered user messages for min content length check - After: Considers all message types (user, assistant, tool) within 20-block lookback window for aggregate eligibility Anthropic's documentation and cookbook demonstrate incremental caching by placing cache_control on the LAST user message: ```python result.append({ "role": "user", "content": [{ "type": "text", "text": turn["content"][0]["text"], "cache_control": {"type": "ephemeral"} # On LAST user message }] }) ``` This pattern is also shown in their official docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching#large-context-caching-example Anthropic's caching system uses prefix matching to find the longest matching prefix from the cache. By placing cache_control on the last user message, we enable the following incremental caching pattern: ``` Turn 1: Cache [System + User1] Turn 2: Reuse [System + User1], process [Assistant1 + User2], cache [System + User1 + Assistant1 + User2] Turn 3: Reuse [System + User1 + Assistant1 + User2], process [Assistant2 + User3], cache [System + User1 + Assistant1 + User2 + Assistant2 + User3] ``` The cache grows incrementally with each turn, building a larger prefix that can be reused. This is the recommended pattern from Anthropic. The new implementation considers all message types (user, assistant, tool) within the 20-block lookback window when checking minimum content length. This ensures that: - Short user questions don't prevent caching when conversation has long assistant responses - The full conversation context is considered for the 1024+ token minimum - Aligns with Anthropic's note: "The automatic prefix checking only looks back approximately 20 content blocks from each explicit breakpoint" None. This is an implementation detail of the CONVERSATION_HISTORY strategy. The API surface remains unchanged. Users may observe: - Different cache hit patterns (should be more effective) - Cache metrics may show higher cache read tokens as conversations grow - Updated `shouldRespectMinLengthForUserHistoryCaching()` to test aggregate eligibility with combined message lengths - Renamed `shouldApplyCacheControlToLastUserMessageForConversationHistory()` (from `shouldRespectAllButLastUserMessageForUserHistoryCaching`) - Added `shouldDemonstrateIncrementalCachingAcrossMultipleTurns()` integration test showing cache growth pattern across 4 conversation turns - Updated mock test assertions to verify last message has cache_control Updated anthropic-chat.adoc to clarify: - CONVERSATION_HISTORY strategy description now mentions incremental prefix caching - Code example comments updated to reflect cache breakpoint on last user message - Implementation Details section expanded with explanation of prefix matching and aggregate eligibility checking - Anthropic Prompt Caching Docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching - Anthropic Cookbook: https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb Signed-off-by: Soby Chacko <soby.chacko@broadcom.com>

ilayaperumalg added the anthropic label Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Anthropic Prompt Caching: Align CONVERSATION_HISTORY with Anthropic's… #4546

Anthropic Prompt Caching: Align CONVERSATION_HISTORY with Anthropic's… #4546

sobychacko commented Oct 3, 2025

Anthropic Prompt Caching: Align CONVERSATION_HISTORY with Anthropic's… #4546

Are you sure you want to change the base?

Anthropic Prompt Caching: Align CONVERSATION_HISTORY with Anthropic's… #4546

Conversation

sobychacko commented Oct 3, 2025