Skip to content

Reconciler stops responding after some minutes when using the builder.OnlyMetadata watch option #1789

@wallrj

Description

@wallrj

I'm using builder.OnlyMetadata in a controller that watches Secrets, so as to avoid caching all the secret data in-memory.

The controller works initially and my Reconcile function is invoked everytime I add / delete / update Secrets in my cluster.

But after some minutes (~8-9 minutes) the controller stops responding to changes in Secrets.

I've increased the logging verbosity so that I can see client-go messages and I notice that:

For Watches without the OnlyMetadata option

There are periodic log message such as:

2022-01-31T11:51:27.415Z LEVEL(-4) pkg/mod/k8s.io/client-go@v0.23.2/tools/cache/reflector.go:167: Watch close - *v1.Issuer total 8 items received 2022-01-31T11:51:27.420Z INFO GET https://10.159.240.1:443/apis/cert-manager.io/v1/issuers?allowWatchBookmarks=true&resourceVersion=2718530&timeoutSeconds=382&watch=true 200 OK in 2 milliseconds 

suggesting that the watch is regularly bein re-established.

For watches with OnlyMetadata option

There are no regular "Watch close" messages.
And at about the time the other resource watches have been closed and re-opened,
the controller stops responding to changes to those resources with the OnlyMetadata (PartialObjectMetaData) setting.

And it may be unrelated, but I notice a slight difference in the initial log messages when I use OnlyMetada

2022-02-01T16:00:06.107Z	INFO	controller.secret	Starting EventSource	{"reconciler group": "", "reconciler kind": "Secret", "source": "kind source: *v1.PartialObjectMetadata"} 2022-02-01T16:00:06.107Z	INFO	controller.secret	Starting Controller	{"reconciler group": "", "reconciler kind": "Secret"} 2022-02-01T16:00:06.108Z	LEVEL(-3)	Starting reflector *v1.PartialObjectMetadata (10h17m20.880108561s) from pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167 2022-02-01T16:00:06.108Z	LEVEL(-3)	Listing and watching *v1.PartialObjectMetadata from pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167 2022-02-01T16:00:06.110Z	INFO	GET https://127.0.0.1:45521/api/v1/namespaces/default/secrets?limit=500&resourceVersion=0 200 OK in 1 milliseconds 2022-02-01T16:00:06.120Z	INFO	GET https://127.0.0.1:45521/api/v1/namespaces/default/secrets?allowWatchBookmarks=true&resourceVersion=5682&timeout=8m33s&timeoutSeconds=513&watch=true 200 OK in 9 milliseconds 

That last GET request includes both a timeout and a timeoutSeconds querystring parameter,
unlike non-only-metadata GET requests which only have the timeoutSeconds querystring parameter.

Recreating the problem

I've created a repo with a very simple kubebuilder project and controller which watches and logs changes to Secrets:

There's an envtest based test which recreates the failure within 10 minutes.

/bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions