Skip to content

Conversation

sobychacko
Copy link
Contributor

… incremental caching pattern

This commit updates the CONVERSATION_HISTORY cache strategy to align with Anthropic's official documentation and cookbook examples (https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb) for incremental conversation caching.

Cache breakpoint placement:

  • Before: Cache breakpoint on penultimate (second-to-last) user message
  • After: Cache breakpoint on last user message

Aggregate eligibility:

  • Before: Only considered user messages for min content length check
  • After: Considers all message types (user, assistant, tool) within 20-block lookback window for aggregate eligibility

Anthropic's documentation and cookbook demonstrate incremental caching by placing cache_control on the LAST user message:

result.append({ "role": "user", "content": [{ "type": "text", "text": turn["content"][0]["text"], "cache_control": {"type": "ephemeral"} # On LAST user message }] })

This pattern is also shown in their official docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching#large-context-caching-example

Anthropic's caching system uses prefix matching to find the longest matching prefix from the cache. By placing cache_control on the last user message, we enable the following incremental caching pattern:

Turn 1: Cache [System + User1] Turn 2: Reuse [System + User1], process [Assistant1 + User2], cache [System + User1 + Assistant1 + User2] Turn 3: Reuse [System + User1 + Assistant1 + User2], process [Assistant2 + User3], cache [System + User1 + Assistant1 + User2 + Assistant2 + User3] 

The cache grows incrementally with each turn, building a larger prefix that can be reused. This is the recommended pattern from Anthropic.

The new implementation considers all message types (user, assistant, tool) within the 20-block lookback window when checking minimum content length. This ensures that:

  • Short user questions don't prevent caching when conversation has long assistant responses
  • The full conversation context is considered for the 1024+ token minimum
  • Aligns with Anthropic's note: "The automatic prefix checking only looks back approximately 20 content blocks from each explicit breakpoint"

None. This is an implementation detail of the CONVERSATION_HISTORY strategy. The API surface remains unchanged. Users may observe:

  • Different cache hit patterns (should be more effective)

  • Cache metrics may show higher cache read tokens as conversations grow

  • Updated shouldRespectMinLengthForUserHistoryCaching() to test aggregate eligibility with combined message lengths

  • Renamed shouldApplyCacheControlToLastUserMessageForConversationHistory() (from shouldRespectAllButLastUserMessageForUserHistoryCaching)

  • Added shouldDemonstrateIncrementalCachingAcrossMultipleTurns() integration test showing cache growth pattern across 4 conversation turns

  • Updated mock test assertions to verify last message has cache_control

Updated anthropic-chat.adoc to clarify:

Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:

  • Add a Signed-off-by line to each commit (git commit -s) per the DCO
  • Rebase your changes on the latest main branch and squash your commits
  • Add/Update unit tests as needed
  • Run a build and make sure all tests pass prior to submission

For more details, please check the contributor guide.
Thank you upfront!

… incremental caching pattern This commit updates the CONVERSATION_HISTORY cache strategy to align with Anthropic's official documentation and cookbook examples (https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb) for incremental conversation caching. **Cache breakpoint placement:** - Before: Cache breakpoint on penultimate (second-to-last) user message - After: Cache breakpoint on last user message **Aggregate eligibility:** - Before: Only considered user messages for min content length check - After: Considers all message types (user, assistant, tool) within 20-block lookback window for aggregate eligibility Anthropic's documentation and cookbook demonstrate incremental caching by placing cache_control on the LAST user message: ```python result.append({ "role": "user", "content": [{ "type": "text", "text": turn["content"][0]["text"], "cache_control": {"type": "ephemeral"} # On LAST user message }] }) ``` This pattern is also shown in their official docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching#large-context-caching-example Anthropic's caching system uses prefix matching to find the longest matching prefix from the cache. By placing cache_control on the last user message, we enable the following incremental caching pattern: ``` Turn 1: Cache [System + User1] Turn 2: Reuse [System + User1], process [Assistant1 + User2], cache [System + User1 + Assistant1 + User2] Turn 3: Reuse [System + User1 + Assistant1 + User2], process [Assistant2 + User3], cache [System + User1 + Assistant1 + User2 + Assistant2 + User3] ``` The cache grows incrementally with each turn, building a larger prefix that can be reused. This is the recommended pattern from Anthropic. The new implementation considers all message types (user, assistant, tool) within the 20-block lookback window when checking minimum content length. This ensures that: - Short user questions don't prevent caching when conversation has long assistant responses - The full conversation context is considered for the 1024+ token minimum - Aligns with Anthropic's note: "The automatic prefix checking only looks back approximately 20 content blocks from each explicit breakpoint" None. This is an implementation detail of the CONVERSATION_HISTORY strategy. The API surface remains unchanged. Users may observe: - Different cache hit patterns (should be more effective) - Cache metrics may show higher cache read tokens as conversations grow - Updated `shouldRespectMinLengthForUserHistoryCaching()` to test aggregate eligibility with combined message lengths - Renamed `shouldApplyCacheControlToLastUserMessageForConversationHistory()` (from `shouldRespectAllButLastUserMessageForUserHistoryCaching`) - Added `shouldDemonstrateIncrementalCachingAcrossMultipleTurns()` integration test showing cache growth pattern across 4 conversation turns - Updated mock test assertions to verify last message has cache_control Updated anthropic-chat.adoc to clarify: - CONVERSATION_HISTORY strategy description now mentions incremental prefix caching - Code example comments updated to reflect cache breakpoint on last user message - Implementation Details section expanded with explanation of prefix matching and aggregate eligibility checking - Anthropic Prompt Caching Docs: https://docs.claude.com/en/docs/build-with-claude/prompt-caching - Anthropic Cookbook: https://github.com/anthropics/claude-cookbooks/blob/main/misc/prompt_caching.ipynb Signed-off-by: Soby Chacko <soby.chacko@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants