-
Couldn't load subscription status.
- Fork 1.2k
Description
I'm using builder.OnlyMetadata in a controller that watches Secrets, so as to avoid caching all the secret data in-memory.
The controller works initially and my Reconcile function is invoked everytime I add / delete / update Secrets in my cluster.
But after some minutes (~8-9 minutes) the controller stops responding to changes in Secrets.
I've increased the logging verbosity so that I can see client-go messages and I notice that:
For Watches without the OnlyMetadata option
There are periodic log message such as:
2022-01-31T11:51:27.415Z LEVEL(-4) pkg/mod/k8s.io/client-go@v0.23.2/tools/cache/reflector.go:167: Watch close - *v1.Issuer total 8 items received 2022-01-31T11:51:27.420Z INFO GET https://10.159.240.1:443/apis/cert-manager.io/v1/issuers?allowWatchBookmarks=true&resourceVersion=2718530&timeoutSeconds=382&watch=true 200 OK in 2 milliseconds suggesting that the watch is regularly bein re-established.
For watches with OnlyMetadata option
There are no regular "Watch close" messages.
And at about the time the other resource watches have been closed and re-opened,
the controller stops responding to changes to those resources with the OnlyMetadata (PartialObjectMetaData) setting.
And it may be unrelated, but I notice a slight difference in the initial log messages when I use OnlyMetada
2022-02-01T16:00:06.107Z INFO controller.secret Starting EventSource {"reconciler group": "", "reconciler kind": "Secret", "source": "kind source: *v1.PartialObjectMetadata"} 2022-02-01T16:00:06.107Z INFO controller.secret Starting Controller {"reconciler group": "", "reconciler kind": "Secret"} 2022-02-01T16:00:06.108Z LEVEL(-3) Starting reflector *v1.PartialObjectMetadata (10h17m20.880108561s) from pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167 2022-02-01T16:00:06.108Z LEVEL(-3) Listing and watching *v1.PartialObjectMetadata from pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167 2022-02-01T16:00:06.110Z INFO GET https://127.0.0.1:45521/api/v1/namespaces/default/secrets?limit=500&resourceVersion=0 200 OK in 1 milliseconds 2022-02-01T16:00:06.120Z INFO GET https://127.0.0.1:45521/api/v1/namespaces/default/secrets?allowWatchBookmarks=true&resourceVersion=5682&timeout=8m33s&timeoutSeconds=513&watch=true 200 OK in 9 milliseconds That last GET request includes both a timeout and a timeoutSeconds querystring parameter,
unlike non-only-metadata GET requests which only have the timeoutSeconds querystring parameter.
Recreating the problem
I've created a repo with a very simple kubebuilder project and controller which watches and logs changes to Secrets:
There's an envtest based test which recreates the failure within 10 minutes.
/bug