Try to avoid event handling leaks #1089

DirectXMan12 · 2020-07-31T01:18:14Z

Since we now have the ability shut down the event broadcaster, we can
write mostly goroutine-leak-free event handling setup. This changes the
default event handling setup to defer the broadcaster initialization the
first time it's used, and then to shut it down once the manager shuts
down.

In the case where a broadcaster is manually specified, it's the
specifier's job to shut down the broadcaster instead.

We'll probably still want to overhaul the whole event system at some
point in the future though.

This also re-enables the tests for leaks, switching them to an
eventually to avoid flakes & reducing the threshold to zero.

Closes #637

/kind bug

k8s-ci-robot · 2020-07-31T01:18:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [DirectXMan12]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

DirectXMan12 · 2020-07-31T01:18:54Z

/assign @vincepri

@vincepri IIRC, you've got context on the flakes on CI, so I'd appreciate a look from you on this one.

rohitagarwal003 · 2020-07-31T02:08:12Z

pkg/internal/recorder/recorder.go

+
+	p.broadcasterOnce.Do(func() {
+		broadcaster, stop := p.makeBroadcaster()
+		broadcaster.StartRecordingToSink(&typedcorev1.EventSinkImpl{Interface: p.evtClient})


Does the watch.Interface returned by StartRecordingToSink and StartEventWatcher need to be Stop()ed or is it taken care of by the above broadcaster.Shutdown()?

ah, good catch, let me double-check

I think the broadcaster handles it: that watcher is returned by apimachinery's watch.Broadcaster, which doesn't actually start a goroutine -- it just holds a channel. Shutdown on the broadcaster calls the underlying watch.Broadcaster's Shutdown, which closes Broadcaster.incoming, which breaks the loop in Broadcaster.loop, which calls m.closeAll after the loop is done, which calls close on all those open channels.

As long as https://github.com/kubernetes/kubernetes/blob/0051d65f9f30db724dfb88256f70c23e37f6b257/staging/src/k8s.io/client-go/tools/record/event.go#L299 exits when brodcaster.Shutdown() is called. I am fine.

vincepri · 2020-07-31T16:08:58Z

/test pull-controller-runtime-test-master

rohitagarwal003 · 2020-08-01T06:17:34Z

/lgtm

k8s-ci-robot · 2020-08-01T06:17:40Z

@mindprince: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vincepri

/lgtm

vincepri · 2020-08-03T16:49:45Z

/milestone v0.6.x

vincepri · 2020-08-03T17:20:47Z

@DirectXMan12 Seems we're still getting some flakes

DirectXMan12 · 2020-08-04T00:25:05Z

Ack, I'll add some reporting in, see if we can figure out why, and then turn that test back off if it's not obvious.

/hold

DirectXMan12 · 2020-08-04T00:49:25Z

/retest

DirectXMan12 · 2020-08-04T00:49:54Z

/test pull-controller-runtime-test-master

DirectXMan12 · 2020-08-04T22:24:59Z

/retest

DirectXMan12 · 2020-08-12T05:00:31Z

🤞 this should fix that flake, or at the very least give us better info. I think the flake may have been due to keep-alive in the http package, which the goleak library showed a bit more clearly.

DirectXMan12 · 2020-08-12T17:52:03Z

gonna run this a couple more times to check for flakes

/test pull-controller-runtime-test-master

DirectXMan12 · 2020-08-12T20:31:28Z

/test pull-controller-runtime-test-master

I think this looks good. Can't repro the flake locally any more, can't repro here, assuming this last run is fine

rohitagarwal003 · 2020-08-16T01:32:17Z

Can we merge this?

DirectXMan12 · 2020-08-17T21:47:04Z

@vincepri can you re-lgtm this?

vincepri

/lgtm
/hold cancel

Thanks for this @DirectXMan12 !

vincepri · 2020-08-18T16:37:20Z

Seems we have a conflict, need to rebase

Since we now have the ability shut down the event broadcaster, we can write mostly goroutine-leak-free event handling setup. This changes the default event handling setup to defer the broadcaster initialization the first time it's used, and then to shut it down once the manager shuts down. In the case where a broadcaster is manually specified, it's the specifier's job to shut down the broadcaster instead. We'll probably still want to overhaul the whole event system at some point in the future though. This also re-enables the tests for leaks, switching them to an eventually to avoid flakes & reducing the threshold to zero.

This switches to using the goleaks package to check for leaks, which should give us a more complete picture of the particular goroutine that's leaking, and should avoid issues where we leak a goroutine, but also stop an old one. This also force-closes keep-alive connections in the leak tests, since those look like leaks, but will actually time out after 30s (outside the timescope of the test).

DirectXMan12 · 2020-08-20T18:14:03Z

rebased

vincepri · 2020-08-20T18:33:50Z

/lgtm

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 31, 2020

k8s-ci-robot requested review from gerred and joelanford July 31, 2020 01:18

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 31, 2020

k8s-ci-robot assigned vincepri Jul 31, 2020

rohitagarwal003 reviewed Jul 31, 2020

View reviewed changes

rohitagarwal003 mentioned this pull request Jul 31, 2020

Clean up event handling code goroutine leaks #637

Closed

vincepri approved these changes Aug 3, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2020

k8s-ci-robot added this to the v0.6.x milestone Aug 3, 2020

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2020

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 4, 2020

DirectXMan12 force-pushed the feature/leakless-events branch from f167ca4 to 626657e Compare August 4, 2020 00:29

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2020

DirectXMan12 force-pushed the feature/leakless-events branch from 626657e to 02fa6d0 Compare August 4, 2020 22:26

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2020

DirectXMan12 force-pushed the feature/leakless-events branch from 02fa6d0 to 010fba4 Compare August 12, 2020 04:59

vincepri approved these changes Aug 18, 2020

View reviewed changes

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 18, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2020

DirectXMan12 added 2 commits August 20, 2020 11:07

DirectXMan12 force-pushed the feature/leakless-events branch from 010fba4 to b269400 Compare August 20, 2020 18:13

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 20, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2020

k8s-ci-robot merged commit 011cd8a into kubernetes-sigs:master Aug 20, 2020

DirectXMan12 deleted the feature/leakless-events branch August 20, 2020 18:55

alvaroaleman mentioned this pull request Dec 7, 2020

controller-runtime: starting/stopping manager.Manager leaks goroutines #1280

Closed

nathantournant mentioned this pull request Apr 25, 2023

fix: remove deprecated field of manager option DataDog/chaos-controller#684

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to avoid event handling leaks #1089

Try to avoid event handling leaks #1089

DirectXMan12 commented Jul 31, 2020 •

edited

k8s-ci-robot commented Jul 31, 2020

DirectXMan12 commented Jul 31, 2020

rohitagarwal003 Jul 31, 2020

DirectXMan12 Jul 31, 2020

DirectXMan12 Jul 31, 2020

rohitagarwal003 Aug 1, 2020

vincepri commented Jul 31, 2020

rohitagarwal003 commented Aug 1, 2020

k8s-ci-robot commented Aug 1, 2020

vincepri left a comment

vincepri commented Aug 3, 2020

vincepri commented Aug 3, 2020

DirectXMan12 commented Aug 4, 2020

DirectXMan12 commented Aug 4, 2020

DirectXMan12 commented Aug 4, 2020

DirectXMan12 commented Aug 4, 2020

DirectXMan12 commented Aug 12, 2020

DirectXMan12 commented Aug 12, 2020

DirectXMan12 commented Aug 12, 2020

rohitagarwal003 commented Aug 16, 2020

DirectXMan12 commented Aug 17, 2020

vincepri left a comment

vincepri commented Aug 18, 2020

DirectXMan12 commented Aug 20, 2020

vincepri commented Aug 20, 2020

Try to avoid event handling leaks #1089

Try to avoid event handling leaks #1089

Conversation

DirectXMan12 commented Jul 31, 2020 • edited

k8s-ci-robot commented Jul 31, 2020

DirectXMan12 commented Jul 31, 2020

rohitagarwal003 Jul 31, 2020

Choose a reason for hiding this comment

DirectXMan12 Jul 31, 2020

Choose a reason for hiding this comment

DirectXMan12 Jul 31, 2020

Choose a reason for hiding this comment

rohitagarwal003 Aug 1, 2020

Choose a reason for hiding this comment

vincepri commented Jul 31, 2020

rohitagarwal003 commented Aug 1, 2020

k8s-ci-robot commented Aug 1, 2020

vincepri left a comment

Choose a reason for hiding this comment

vincepri commented Aug 3, 2020

vincepri commented Aug 3, 2020

DirectXMan12 commented Aug 4, 2020

DirectXMan12 commented Aug 4, 2020

DirectXMan12 commented Aug 4, 2020

DirectXMan12 commented Aug 4, 2020

DirectXMan12 commented Aug 12, 2020

DirectXMan12 commented Aug 12, 2020

DirectXMan12 commented Aug 12, 2020

rohitagarwal003 commented Aug 16, 2020

DirectXMan12 commented Aug 17, 2020

vincepri left a comment

Choose a reason for hiding this comment

vincepri commented Aug 18, 2020

DirectXMan12 commented Aug 20, 2020

vincepri commented Aug 20, 2020

DirectXMan12 commented Jul 31, 2020 •

edited