feat: Add support for Agent Identity bound tokens #1821
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
This change introduces support for requesting certificate-bound access tokens for Agent Identities on GKE and Cloud Run. The design doc: go/sdk-agent-identity
Please keep in mind that the unit tests are a work in progress and they are not comprehensive yet.
I've opened the PR to give folks a chance to review it while I'm OOO (I'll be back on Oct 15)
Implementation Details & Open Discussion Points
The current implementation contains specific logic for handling resiliency and failures. These are initial implementations and are fully open to discussion, with the final design subject to further conversation.
Resiliency and Backoff: The Cloud Run flow includes a retry mechanism with exponential backoff (5 attempts over ~15 seconds) for loading certificates. The GKE flow currently lacks this and fails immediately. We welcome discussion on whether to add retries to the GKE flow and if the current backoff configuration for Cloud Run is appropriate.
Failure Policy: In the Cloud Run flow, a failure to load the certificate results in a fallback to a standard, unbound token. We are seeking feedback on whether this "soft fail" is the correct approach, or if a "hard fail" would be more suitable for security-conscious applications.