Skip to content

Conversation

@wallrj
Copy link
Contributor

@wallrj wallrj commented Feb 3, 2022

Fixes: #1789

I copied some tests from client-go and observed that the original implementation of gvkFixupWatcher fails those tests:

go test ./pkg/cache/internal/... -timeout 2s panic: test timed out after 2s goroutine 34 [running]: testing.(*M).startAlarm.func1()	/home/richard/sdk/go1.17/src/testing/testing.go:1788 +0x8e created by time.goFunc	/home/richard/sdk/go1.17/src/time/sleep.go:180 +0x31 goroutine 1 [chan receive]: testing.(*T).Run(0xc0003f49c0, {0x128a598, 0x46c533}, 0x1327bb0)	/home/richard/sdk/go1.17/src/testing/testing.go:1307 +0x375 testing.runTests.func1(0xc0003f49c0)	/home/richard/sdk/go1.17/src/testing/testing.go:1598 +0x6e testing.tRunner(0xc0003f49c0, 0xc000187d18)	/home/richard/sdk/go1.17/src/testing/testing.go:1259 +0x102 testing.runTests(0xc0003a9980, {0x1d4e810, 0x1, 0x1}, {0x48986d, 0x1285e22, 0x1d6a3c0})	/home/richard/sdk/go1.17/src/testing/testing.go:1596 +0x43f testing.(*M).Run(0xc0003a9980)	/home/richard/sdk/go1.17/src/testing/testing.go:1504 +0x51d main.main()	_testmain.go:43 +0x14b goroutine 19 [chan receive]: k8s.io/klog/v2.(*loggingT).flushDaemon(0x0)	/home/richard/go/pkg/mod/k8s.io/klog/v2@v2.30.0/klog.go:1181 +0x6a created by k8s.io/klog/v2.init.0	/home/richard/go/pkg/mod/k8s.io/klog/v2@v2.30.0/klog.go:420 +0xfb goroutine 20 [sleep]: time.Sleep(0x6fc23ac00)	/home/richard/sdk/go1.17/src/runtime/time.go:193 +0x12e sigs.k8s.io/controller-runtime/pkg/log.init.0.func1()	/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/log/log.go:63 +0x38 created by sigs.k8s.io/controller-runtime/pkg/log.init.0	/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/log/log.go:62 +0x25 goroutine 21 [chan receive]: sigs.k8s.io/controller-runtime/pkg/cache/internal.TestGVKFixupWatcher.func1({0x141ab60, 0xc00045ad80})	/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:105 +0x85 sigs.k8s.io/controller-runtime/pkg/cache/internal.TestGVKFixupWatcher(0xc0003f4b60)	/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:121 +0x52b testing.tRunner(0xc0003f4b60, 0x1327bb0)	/home/richard/sdk/go1.17/src/testing/testing.go:1259 +0x102 created by testing.(*T).Run	/home/richard/sdk/go1.17/src/testing/testing.go:1306 +0x35a FAIL	sigs.k8s.io/controller-runtime/pkg/cache/internal	2.026s FAIL 

I've now re-implemented the gvkFixupWatcher as a watch.FilterFunc along with watch.Filter which seems to have the desired behaviour.

go test ./pkg/cache/internal/... -timeout 2s ok	sigs.k8s.io/controller-runtime/pkg/cache/internal	0.023s 

This change allows the reflector to know when the ResultChannel is closed at which point it breaks from its watch loop and establishes a new watch:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 3, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @wallrj. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 3, 2022
w.ch <- e
}
w.wg.Done()
close(w.ch)
Copy link
Member

@joelanford joelanford Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved above w.wg.Done() so that there's a guarantee that w.ch is closed by the time w.Stop() returns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found watch.Filter which has the correct channel closing behaviour so I've re-implemented. See what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! TIL about watch.Filter.

@wallrj wallrj force-pushed the 1789-close-metadata-watch-result-channel branch from 0359498 to cab8dd1 Compare February 3, 2022 21:17
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 3, 2022
@wallrj wallrj changed the title 🐛 WIP: Close the metadata watch result channel earlier 🐛 Close the metadata watch result channel earlier Feb 3, 2022
watcher,
func(in watch.Event) (out watch.Event, keep bool) {
in.DeepCopyInto(&out)
out.Object.GetObjectKind().SetGroupVersionKind(gvk)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation mutated the watch event.
I've used DeepCopy here, but I'm not sure if it's necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think DeepCopy is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in watch.Event is passed in by value, but it has a pointer to a runtime.Object (e.g. &metav1.PartialObjectMetadata ) which might be mutated elsewhere, so I thought it was safest to DeepCopy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no strong objections to DeepCopy from me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it is necessary, but no strong objections from me too :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove deepcopy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Done.

f.Modify(newTestType("bar"))
f.Delete(newTestType("bar"))
f.Error(newTestType("error: blah"))
f.Stop()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The test sends events to the wrapped watcher and then checks the results that come from the wrapper.

@wallrj wallrj requested a review from joelanford February 3, 2022 22:05
@joelanford
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 4, 2022
// TestGVKFixupWatcher tests that gvkFixupWatcher behaves like the watch that it
// wraps and that it overrides the GVK.
// Adapted from https://github.com/kubernetes/kubernetes/blob/adbda068c1808fcc8a64a94269e0766b5c46ec41/staging/src/k8s.io/apimachinery/pkg/watch/watch_test.go#L33-L78
func TestGVKFixupWatcher(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the rest of controller-runtime's tests use Ginkgo/Gomega. Would you mind refactoring this test to be consistent with the rest of the repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-implemented the tests using Gomega Expect , but tried to retain the spirit of the original tests so that they can be easily compared to the original.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also replaced the convoluted fake type with actual metav1.PartialObjectMetadata

Signed-off-by: Richard Wall <richard.wall@jetstack.io>
Signed-off-by: Richard Wall <richard.wall@jetstack.io>
@wallrj wallrj force-pushed the 1789-close-metadata-watch-result-channel branch from cab8dd1 to 5f4904e Compare February 4, 2022 09:54
@wallrj wallrj requested a review from joelanford February 4, 2022 10:03
@wallrj wallrj changed the title 🐛 Close the metadata watch result channel earlier 🐛 Fixed a bug in newGVKFixupWatcher which caused the metadata informer to hang Feb 4, 2022
@wallrj
Copy link
Contributor Author

wallrj commented Feb 4, 2022

Some evidence that the problem has been fixed:

Test fails with the old implementation
 richard   0fbdc4b  ~  projects  kubernetes-sigs  controller-runtime  1  go test ./pkg/cache/internal/... -v --ginkgo.v === RUN TestSource Running Suite: Cache Internal Suite =================================== Random Seed: 1643975186 Will run 1 of 1 specs gvkFixupWatcher behaves like watch.FakeWatcher /home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:35 STEP: Fixing up watch.EventType: ADDED and passing it on STEP: Fixing up watch.EventType: MODIFIED and passing it on STEP: Fixing up watch.EventType: MODIFIED and passing it on STEP: Fixing up watch.EventType: DELETED and passing it on STEP: Fixing up watch.EventType: ERROR and passing it on • Failure [1.002 seconds] gvkFixupWatcher /home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:34 behaves like watch.FakeWatcher [It] /home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:35 Timed out after 1.001s. Expected <<-chan watch.Event | len:0, cap:0>: 0xc0000be540  to be closed   /home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:79 ------------------------------    Summarizing 1 Failure:  [Fail] gvkFixupWatcher [It] behaves like watch.FakeWatcher  /home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:79  Ran 1 of 1 Specs in 1.002 seconds FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped --- FAIL: TestSource (1.00s) FAIL FAIL	sigs.k8s.io/controller-runtime/pkg/cache/internal	1.039s FAIL 
Test succeeds with the new implementation
richard   1789-close-metadata-watch-result-channel  ~  projects  kubernetes-sigs  controller-runtime  1  go test ./pkg/cache/internal/... -v --ginkgo.v === RUN TestSource Running Suite: Cache Internal Suite =================================== Random Seed: 1643975277 Will run 1 of 1 specs gvkFixupWatcher behaves like watch.FakeWatcher /home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:35 STEP: Fixing up watch.EventType: ADDED and passing it on STEP: Fixing up watch.EventType: MODIFIED and passing it on STEP: Fixing up watch.EventType: MODIFIED and passing it on STEP: Fixing up watch.EventType: DELETED and passing it on STEP: Fixing up watch.EventType: ERROR and passing it on • Ran 1 of 1 Specs in 0.010 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped --- PASS: TestSource (0.01s) PASS ok	sigs.k8s.io/controller-runtime/pkg/cache/internal	0.027s 
A sample informer now closes and re-establishes watches regularly

https://github.com/wallrj/tmp-controller-runtime-partial-object-experiment/blob/44e2c5d088d891252edf36056d9297bd66a17d7d/cmd/ctrl-metadata-informer/main.go

diff --git a/go.mod b/go.mod index fb929dc..99e989c 100644 --- a/go.mod +++ b/go.mod @@ -14,6 +14,8 @@ require ( sigs.k8s.io/controller-runtime v0.11.0 ) +replace sigs.k8s.io/controller-runtime => ../../kubernetes-sigs/controller-runtime + require ( cloud.google.com/go v0.81.0 // indirect github.com/Azure/go-autorest v14.2.0+incompatible // indirect 
richard   master  ~  projects  wallrj  partial-object-watch  1  go run ./cmd/ctrl-metadata-informer --zap-log-level 8 --zap-time-encoding iso8601 --namespace test1 2022-02-04T10:21:54.413Z LEVEL(-6) Config loaded from file: /home/richard/.kube/config ... 2022-02-04T11:38:59.523Z INFO add {"name": "test1/foo-9673"} 2022-02-04T11:39:23.998Z LEVEL(-4) pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Watch close - *v1.PartialObjectMetadata total 15 items received 2022-02-04T11:39:23.998Z INFO GET https://127.0.0.1:46255/api/v1/namespaces/test1/secrets?allowWatchBookmarks=true&resourceVersion=78217&timeout=8m3s&timeoutSeconds=483&watch=true 2022-02-04T11:39:23.998Z INFO Request Headers: 2022-02-04T11:39:23.998Z INFO Accept: application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json 2022-02-04T11:39:23.998Z INFO User-Agent: ctrl-metadata-informer/v0.0.0 (linux/amd64) kubernetes/$Format 2022-02-04T11:39:24.000Z INFO Response Status: 200 OK in 1 milliseconds 2022-02-04T11:39:24.000Z INFO Response Headers: 2022-02-04T11:39:24.000Z INFO Content-Type: application/vnd.kubernetes.protobuf;stream=watch 2022-02-04T11:39:24.000Z INFO Date: Fri, 04 Feb 2022 11:39:24 GMT 2022-02-04T11:39:24.000Z INFO Cache-Control: no-cache, private 2022-02-04T11:39:59.621Z INFO add {"name": "test1/foo-25053"} 2022-02-04T11:40:59.744Z INFO add {"name": "test1/foo-10768"} 2022-02-04T11:41:59.849Z INFO add {"name": "test1/foo-7385"} 2022-02-04T11:42:59.958Z INFO add {"name": "test1/foo-24315"} 2022-02-04T11:44:00.062Z INFO add {"name": "test1/foo-26661"} 2022-02-04T11:45:00.151Z INFO add {"name": "test1/foo-21396"} 2022-02-04T11:46:00.299Z INFO add {"name": "test1/foo-23017"} 2022-02-04T11:47:00.451Z INFO add {"name": "test1/foo-21648"} 2022-02-04T11:47:27.000Z LEVEL(-4) pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Watch close - *v1.PartialObjectMetadata total 17 items received 2022-02-04T11:47:27.001Z INFO GET https://127.0.0.1:46255/api/v1/namespaces/test1/secrets?allowWatchBookmarks=true&resourceVersion=79033&timeout=8m2s&timeoutSeconds=482&watch=true 2022-02-04T11:47:27.001Z INFO Request Headers: 2022-02-04T11:47:27.001Z INFO Accept: application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json 2022-02-04T11:47:27.001Z INFO User-Agent: ctrl-metadata-informer/v0.0.0 (linux/amd64) kubernetes/$Format 2022-02-04T11:47:27.004Z INFO Response Status: 200 OK in 2 milliseconds 2022-02-04T11:47:27.004Z INFO Response Headers: 2022-02-04T11:47:27.004Z INFO Cache-Control: no-cache, private 2022-02-04T11:47:27.004Z INFO Content-Type: application/vnd.kubernetes.protobuf;stream=watch 2022-02-04T11:47:27.004Z INFO Date: Fri, 04 Feb 2022 11:47:27 GMT 2022-02-04T11:48:00.596Z INFO add {"name": "test1/foo-207"} 2022-02-04T11:49:00.692Z INFO add {"name": "test1/foo-5199"} 
Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Thanks for all the help on this!

Comment on lines +412 to +428
// newGVKFixupWatcher adds a wrapper that preserves the GVK information when
// events come in.
//
// This works around a bug where GVK information is not passed into mapping
// functions when using the OnlyMetadata option in the builder.
// This issue is most likely caused by kubernetes/kubernetes#80609.
// See kubernetes-sigs/controller-runtime#1484.
//
// This was originally implemented as a cache.ResourceEventHandler wrapper but
// that contained a data race which was resolved by setting the GVK in a watch
// wrapper, before the objects are written to the cache.
// See kubernetes-sigs/controller-runtime#1650.
//
// The original watch wrapper was found to be incompatible with
// k8s.io/client-go/tools/cache.Reflector so it has been re-implemented as a
// watch.Filter which is compatible.
// See kubernetes-sigs/controller-runtime#1789.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Thanks for capturing all of that history!

watcher,
func(in watch.Event) (out watch.Event, keep bool) {
in.DeepCopyInto(&out)
out.Object.GetObjectKind().SetGroupVersionKind(gvk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no strong objections to DeepCopy from me!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joelanford, wallrj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@joelanford joelanford removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2022
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2022
@wallrj
Copy link
Contributor Author

wallrj commented Feb 5, 2022

/cc @vincepri because I see that you introduced the metadata only watches in ClusterAPI and which may have been affected by this bug since upgrading to controller-runtime v0.10.1 in kubernetes-sigs/cluster-api#5249

Copy link
Contributor

@FillZpp FillZpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LG. Copyright should be 2022.

@@ -0,0 +1,94 @@
/*
Copyright 2018 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/2018/2022/

@@ -0,0 +1,31 @@
/*
Copyright 2018 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/2018/2022/

watcher,
func(in watch.Event) (out watch.Event, keep bool) {
in.DeepCopyInto(&out)
out.Object.GetObjectKind().SetGroupVersionKind(gvk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it is necessary, but no strong objections from me too :)

Signed-off-by: Richard Wall <richard.wall@jetstack.io>
@wallrj wallrj requested a review from FillZpp February 7, 2022 18:39
Signed-off-by: Richard Wall <richard.wall@jetstack.io>
@wallrj wallrj requested a review from vincepri February 7, 2022 21:03
@alvaroaleman alvaroaleman added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 7, 2022
@FillZpp
Copy link
Contributor

FillZpp commented Feb 8, 2022

/lgtm

Thanks for all the help on this!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2022
@k8s-ci-robot k8s-ci-robot merged commit 273e608 into kubernetes-sigs:master Feb 8, 2022
@k8s-ci-robot k8s-ci-robot added this to the v0.10.x milestone Feb 8, 2022
@wallrj wallrj deleted the 1789-close-metadata-watch-result-channel branch February 8, 2022 07:38
@joelanford
Copy link
Member

Let's see if this works:

/cherry-pick release-0.11

@k8s-infra-cherrypick-robot

@joelanford: new pull request created: #1801

In response to this:

Let's see if this works:

/cherry-pick release-0.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

7 participants