Skip to content

Conversation

@rbtr
Copy link
Collaborator

@rbtr rbtr commented Dec 6, 2024

Reason for Change:
As of AKS 1.30, service account tokens refresh every ~1 hour when OIDC is enabled. They were previously valid for a year.
This setkubeconfigpath.ps1 script was at some point necessary on Windows to create a valid kubeconfig for CNS (and NPM and others), but based on my testing we don't need it today. It copies the token from the token file to create a custom kubeconfig from a template at startup and then we pass CNS that file via --kubeconfig.

The script runs at startup and never re-runs, so the token that exists at Pod start is the token CNS will try to use forever. Yearly token lifespans were long enough that no CNS Pod was ever up long enough to hit token expiration.

This becomes an issue with hourly token lifespans. An hour after Pod start, the token becomes invalid and CNS can no longer auth to the API server. For PodSubnet clusters, this permanently prevents CNS from being able to scale the IPAM pool and provide more Pod IPs.

Issue Fixed:

Azure/AKS#4679

Requirements:

Notes:

@rbtr rbtr requested review from a team as code owners December 6, 2024 19:13
@rbtr rbtr self-assigned this Dec 6, 2024
@rbtr rbtr added cns Related to CNS. fix Fixes something. labels Dec 6, 2024
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 3 changed files in this pull request and generated no suggestions.

Files not reviewed (1)
  • cns/Dockerfile: Language not supported
Copy link
Contributor

@thatmattlong thatmattlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no powershell.exe needed I think

@rbtr
Copy link
Collaborator Author

rbtr commented Dec 9, 2024

/azp run Azure Container Networking PR

@rbtr rbtr requested a review from thatmattlong December 9, 2024 23:01
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).
jpayne3506
jpayne3506 previously approved these changes Dec 10, 2024
github-merge-queue bot pushed a commit to microsoft/retina that referenced this pull request Dec 12, 2024
# Description As of [AKS 1.30](https://github.com/Azure/AKS/releases/tag/2024-06-09), service account tokens refresh every ~1 hour when OIDC is enabled. They were previously valid for a year. [This setkubeconfigpath.ps1 script](https://github.com/Azure/azure-container-networking/blob/47b243c42fd16119a96ab6d06eb602ac2ce40e7d/npm/examples/windows/setkubeconfigpath.ps1) was at some point necessary on Windows to create a valid kubeconfig for Retina WIndows. It copies the token from the token file to create a custom kubeconfig from a template at startup and then we pass Retina Windows that file via --kubeconfig. The script runs at startup and never re-runs, so the token that exists at Pod start is the token CNS will try to use forever. Yearly token lifespans were long enough that no Retina Windows Pod was ever up long enough to hit token expiration. This becomes an issue with hourly token lifespans. An hour after Pod start, the token becomes invalid and Retina Windows can no longer auth to the API server. For PodSubnet clusters, this permanently prevents Retina Windows from being able to scale the IPAM pool and provide more Pod IPs. Fix for CNS which referenced: Azure/azure-container-networking#3248 ## Related Issue If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request. ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/contributing). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [ ] I have updated the documentation, if necessary. - [ ] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed Please add any relevant screenshots or GIFs to showcase the changes made. ## Additional Notes Add any additional notes or context about the pull request here. --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project.
github-merge-queue bot pushed a commit to microsoft/retina that referenced this pull request Dec 12, 2024
# Description As of [AKS 1.30](https://github.com/Azure/AKS/releases/tag/2024-06-09), service account tokens refresh every ~1 hour when OIDC is enabled. They were previously valid for a year. [This setkubeconfigpath.ps1 script](https://github.com/Azure/azure-container-networking/blob/47b243c42fd16119a96ab6d06eb602ac2ce40e7d/npm/examples/windows/setkubeconfigpath.ps1) was at some point necessary on Windows to create a valid kubeconfig for Retina WIndows. It copies the token from the token file to create a custom kubeconfig from a template at startup and then we pass Retina Windows that file via --kubeconfig. The script runs at startup and never re-runs, so the token that exists at Pod start is the token CNS will try to use forever. Yearly token lifespans were long enough that no Retina Windows Pod was ever up long enough to hit token expiration. This becomes an issue with hourly token lifespans. An hour after Pod start, the token becomes invalid and Retina Windows can no longer auth to the API server. For PodSubnet clusters, this permanently prevents Retina Windows from being able to scale the IPAM pool and provide more Pod IPs. Fix for CNS which referenced: Azure/azure-container-networking#3248 ## Related Issue If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request. ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/contributing). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [ ] I have updated the documentation, if necessary. - [ ] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed Please add any relevant screenshots or GIFs to showcase the changes made. ## Additional Notes Add any additional notes or context about the pull request here. --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project.
@rbtr rbtr force-pushed the fix/windows-tokens branch from 765c550 to c5b0288 Compare December 16, 2024 22:24
@rbtr
Copy link
Collaborator Author

rbtr commented Dec 16, 2024

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).
@github-actions
Copy link

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale due to inactivity. label Dec 31, 2024
@rbtr rbtr force-pushed the fix/windows-tokens branch from c5b0288 to 318ab1c Compare January 3, 2025 23:57
@rbtr rbtr enabled auto-merge January 3, 2025 23:59
@rbtr rbtr restored the fix/windows-tokens branch May 29, 2025 22:55
@rbtr rbtr reopened this May 29, 2025
@github-actions
Copy link

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale due to inactivity. label Jun 13, 2025
@rbtr rbtr removed the stale Stale due to inactivity. label Jun 13, 2025
paulyufan2
paulyufan2 previously approved these changes Jun 13, 2025
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
@rbtr rbtr dismissed stale reviews from paulyufan2 and jpayne3506 via 6856581 June 13, 2025 17:41
@rbtr rbtr force-pushed the fix/windows-tokens branch from 7b6c057 to 6856581 Compare June 13, 2025 17:41
@rbtr rbtr enabled auto-merge June 13, 2025 17:42
paulyufan2
paulyufan2 previously approved these changes Jun 13, 2025
@paulyufan2
Copy link
Contributor

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
@paulyufan2
Copy link
Contributor

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).
@rbtr rbtr added this pull request to the merge queue Jun 13, 2025
Merged via the queue into master with commit 2b3dbd1 Jun 13, 2025
16 checks passed
@rbtr rbtr deleted the fix/windows-tokens branch June 13, 2025 21:04
@jpayne3506 jpayne3506 added release/latest Change affects latest release train needs-backport Change needs to be backported to previous release trains release/1.6 Change affects 1.6 release train labels Jul 28, 2025
sivakami-projects pushed a commit that referenced this pull request Oct 23, 2025
* fix: let Windows CNS use the InClusterConfig Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * remove pwsh from cmd Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cns Related to CNS. exempt-stale Keep this fresh fix Fixes something. needs-backport Change needs to be backported to previous release trains release/latest Change affects latest release train release/1.6 Change affects 1.6 release train

4 participants