Usage
The TensorZero Gateway supports the following cache modes:- write_only(default): Only write to cache but don’t serve cached responses
- read_only: Only read from cache but don’t write new entries
- on: Both read from and write to cache
- off: Disable caching completely
Example
Technical Notes
- The cache applies to individual model requests, not inference requests. This means that the following will be cached separately: multiple variants of the same function; multiple calls to the same function with different parameters; individual model requests for inference-time optimizations; and so on.
- The max_age_sparameter applies to the retrieval of cached responses. The cache does not automatically delete old entries (i.e. not a TTL).
- When the gateway serves a cached response, the usage fields are set to zero.
- The cache data is stored in ClickHouse.
- For batch inference, the gateway only writes to the cache but does not serve cached responses.