New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to avoid event handling leaks #1089
Try to avoid event handling leaks #1089
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DirectXMan12 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
p.broadcasterOnce.Do(func() { | ||
broadcaster, stop := p.makeBroadcaster() | ||
broadcaster.StartRecordingToSink(&typedcorev1.EventSinkImpl{Interface: p.evtClient}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the watch.Interface
returned by StartRecordingToSink
and StartEventWatcher
need to be Stop()
ed or is it taken care of by the above broadcaster.Shutdown()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch, let me double-check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the broadcaster handles it: that watcher is returned by apimachinery's watch.Broadcaster, which doesn't actually start a goroutine -- it just holds a channel. Shutdown
on the broadcaster calls the underlying watch.Broadcaster
's Shutdown
, which closes Broadcaster.incoming
, which breaks the loop in Broadcaster.loop
, which calls m.closeAll
after the loop is done, which calls close
on all those open channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as https://github.com/kubernetes/kubernetes/blob/0051d65f9f30db724dfb88256f70c23e37f6b257/staging/src/k8s.io/client-go/tools/record/event.go#L299 exits when brodcaster.Shutdown()
is called. I am fine.
/test pull-controller-runtime-test-master |
/lgtm |
@mindprince: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/milestone v0.6.x |
@DirectXMan12 Seems we're still getting some flakes |
Ack, I'll add some reporting in, see if we can figure out why, and then turn that test back off if it's not obvious. /hold |
f167ca4
to
626657e
Compare
/retest |
/test pull-controller-runtime-test-master |
/retest |
626657e
to
02fa6d0
Compare
02fa6d0
to
010fba4
Compare
🤞 this should fix that flake, or at the very least give us better info. I think the flake may have been due to keep-alive in the http package, which the |
gonna run this a couple more times to check for flakes /test pull-controller-runtime-test-master |
/test pull-controller-runtime-test-master I think this looks good. Can't repro the flake locally any more, can't repro here, assuming this last run is fine |
Can we merge this? |
@vincepri can you re-lgtm this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold cancel
Thanks for this @DirectXMan12 !
Seems we have a conflict, need to rebase |
Since we now have the ability shut down the event broadcaster, we can write mostly goroutine-leak-free event handling setup. This changes the default event handling setup to defer the broadcaster initialization the first time it's used, and then to shut it down once the manager shuts down. In the case where a broadcaster is manually specified, it's the specifier's job to shut down the broadcaster instead. We'll probably still want to overhaul the whole event system at some point in the future though. This also re-enables the tests for leaks, switching them to an eventually to avoid flakes & reducing the threshold to zero.
This switches to using the goleaks package to check for leaks, which should give us a more complete picture of the particular goroutine that's leaking, and should avoid issues where we leak a goroutine, but also stop an old one. This also force-closes keep-alive connections in the leak tests, since those look like leaks, but will actually time out after 30s (outside the timescope of the test).
010fba4
to
b269400
Compare
rebased |
/lgtm |
Since we now have the ability shut down the event broadcaster, we can
write mostly goroutine-leak-free event handling setup. This changes the
default event handling setup to defer the broadcaster initialization the
first time it's used, and then to shut it down once the manager shuts
down.
In the case where a broadcaster is manually specified, it's the
specifier's job to shut down the broadcaster instead.
We'll probably still want to overhaul the whole event system at some
point in the future though.
This also re-enables the tests for leaks, switching them to an
eventually to avoid flakes & reducing the threshold to zero.
Closes #637
/kind bug