I’m building a product using the Gemini API, and I’m really hoping to leverage implicit caching to reduce the (very) high API costs. However, there’s not much detailed documentation about how it actually works, so I wanted to ask here in case anyone knows.
Specifically — does the system instruction (the part that’s fixed at the beginning of the prompt) count as part of what’s being cached implicitly? Or is it treated separately and excluded from implicit caching?
Any clarification would be super appreciated. Thanks!
Hi, thanks for your answer. Does Gemini 2.5 Pro require at least 4,096 tokens or 2,048 tokens for implicit caching to work? I’ve seen some documents mentioning 2,048 and others mentioning 4,096. Also, are there any troubleshooting steps I can take if I’ve already met all the requirements but implicit caching still doesn’t seem to activate? That seems to be my case. I might have to use explicit caching if there are no troubleshooting available. Thanks a lot.