- Notifications
You must be signed in to change notification settings - Fork 259
feat: CNS checks apiserver in healthz #3269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: CNS checks apiserver in healthz #3269
Conversation
b78005a to 20ed505 Compare 1dffb35 to e7602f7 Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be behind a flag since it is otherwise a breaking change for non-k8s scenarios.
not every instance of CNS will need (or can) check NNCs. The `CRD` channel mode is used by AKS to indicate that CNS will be reading/watching NNCs. `AzureHost` is a newer mode that's used in nodesubnet where NNCs aren't used and therefore CNS has no reason to have its health depend on NNC access.
5f5812c to bcc574f Compare | This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
| This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
| This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
| Pull request closed due to inactivity. |
| This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stalebot reminded me of this 😟 looks good today, i'm happy to take it
| /azp run Azure Container Networking PR |
| Azure Pipelines successfully started running 1 pipeline(s). |
ramiro is out of office, but has already resolved their comments
| This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
* feat: CNS checks apiserver in healthz * chore: only check NNCs if `ChannelMode` is `CRD` not every instance of CNS will need (or can) check NNCs. The `CRD` channel mode is used by AKS to indicate that CNS will be reading/watching NNCs. `AzureHost` is a newer mode that's used in nodesubnet where NNCs aren't used and therefore CNS has no reason to have its health depend on NNC access. * test: add unit tests * refactor: return error from NewHealthzHandlerWithChecks instead of panicking * chore: address lint errors * refactor: only get kubeConfig when in CRD mode * chore: fix lint errors
Reason for Change:
CNS should regularly check if it is able to reach the apiserver to mitigate the risk of losing access
and then failing silently. The issue linked below is a scenario where CNS was not reloading the
refreshed SA token from disk resulting in controller-runtime's failure to update its cache and watch
pods and NNCs. This unauthorized failure went unnoticed until workloads were scaled up on the node,
at which point CNS was unable to update the NNC to request for more IPs so the pods were stuck in
Pending.
For now, this will just add a checker when the
ChannelModeisCRD. Any other modes will get thedefault
Pingchecker built-in to controller-runtime.Issue Fixed:
Follow up for Azure/AKS#4679
Requirements:
Notes: