System instruction and implicit caching question

Hey everyone,

I’m building a product using the Gemini API, and I’m really hoping to leverage implicit caching to reduce the (very) high API costs. However, there’s not much detailed documentation about how it actually works, so I wanted to ask here in case anyone knows.

Specifically — does the system instruction (the part that’s fixed at the beginning of the prompt) count as part of what’s being cached implicitly? Or is it treated separately and excluded from implicit caching?

Any clarification would be super appreciated. Thanks!

1 Like

Hi @komin ,

Implicit caching is enabled by default for all Gemini 2.5 models The system instruction counts as part of the cached prefix,
Please refer to- https://ai.google.dev/gemini-api/docs/caching?lang=node#implicit-caching.

Let me know if you have any further questions.

1 Like

Hi, thanks for your answer. Does Gemini 2.5 Pro require at least 4,096 tokens or 2,048 tokens for implicit caching to work? I’ve seen some documents mentioning 2,048 and others mentioning 4,096. Also, are there any troubleshooting steps I can take if I’ve already met all the requirements but implicit caching still doesn’t seem to activate? That seems to be my case.
I might have to use explicit caching if there are no troubleshooting available.
Thanks a lot.