Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to avoid event handling leaks #1089

Merged

Conversation

DirectXMan12
Copy link
Contributor

@DirectXMan12 DirectXMan12 commented Jul 31, 2020

Since we now have the ability shut down the event broadcaster, we can
write mostly goroutine-leak-free event handling setup. This changes the
default event handling setup to defer the broadcaster initialization the
first time it's used, and then to shut it down once the manager shuts
down.

In the case where a broadcaster is manually specified, it's the
specifier's job to shut down the broadcaster instead.

We'll probably still want to overhaul the whole event system at some
point in the future though.

This also re-enables the tests for leaks, switching them to an
eventually to avoid flakes & reducing the threshold to zero.

Closes #637

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 31, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 31, 2020
@DirectXMan12
Copy link
Contributor Author

/assign @vincepri

@vincepri IIRC, you've got context on the flakes on CI, so I'd appreciate a look from you on this one.


p.broadcasterOnce.Do(func() {
broadcaster, stop := p.makeBroadcaster()
broadcaster.StartRecordingToSink(&typedcorev1.EventSinkImpl{Interface: p.evtClient})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the watch.Interface returned by StartRecordingToSink and StartEventWatcher need to be Stop()ed or is it taken care of by the above broadcaster.Shutdown()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good catch, let me double-check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the broadcaster handles it: that watcher is returned by apimachinery's watch.Broadcaster, which doesn't actually start a goroutine -- it just holds a channel. Shutdown on the broadcaster calls the underlying watch.Broadcaster's Shutdown, which closes Broadcaster.incoming, which breaks the loop in Broadcaster.loop, which calls m.closeAll after the loop is done, which calls close on all those open channels.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri
Copy link
Member

/test pull-controller-runtime-test-master

@rohitagarwal003
Copy link

/lgtm

@k8s-ci-robot
Copy link
Contributor

@mindprince: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2020
@vincepri
Copy link
Member

vincepri commented Aug 3, 2020

/milestone v0.6.x

@k8s-ci-robot k8s-ci-robot added this to the v0.6.x milestone Aug 3, 2020
@vincepri
Copy link
Member

vincepri commented Aug 3, 2020

@DirectXMan12 Seems we're still getting some flakes

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2020
@DirectXMan12
Copy link
Contributor Author

Ack, I'll add some reporting in, see if we can figure out why, and then turn that test back off if it's not obvious.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 4, 2020
@DirectXMan12
Copy link
Contributor Author

/retest

@DirectXMan12
Copy link
Contributor Author

/test pull-controller-runtime-test-master

@DirectXMan12
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2020
@DirectXMan12
Copy link
Contributor Author

🤞 this should fix that flake, or at the very least give us better info. I think the flake may have been due to keep-alive in the http package, which the goleak library showed a bit more clearly.

@DirectXMan12
Copy link
Contributor Author

gonna run this a couple more times to check for flakes

/test pull-controller-runtime-test-master

@DirectXMan12
Copy link
Contributor Author

/test pull-controller-runtime-test-master

I think this looks good. Can't repro the flake locally any more, can't repro here, assuming this last run is fine

@rohitagarwal003
Copy link

Can we merge this?

@DirectXMan12
Copy link
Contributor Author

@vincepri can you re-lgtm this?

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold cancel

Thanks for this @DirectXMan12 !

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 18, 2020
@vincepri
Copy link
Member

Seems we have a conflict, need to rebase

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2020
Since we now have the ability shut down the event broadcaster, we can
write mostly goroutine-leak-free event handling setup.  This changes the
default event handling setup to defer the broadcaster initialization the
first time it's used, and then to shut it down once the manager shuts
down.

In the case where a broadcaster is manually specified, it's the
specifier's job to shut down the broadcaster instead.

We'll probably still want to overhaul the whole event system at some
point in the future though.

This also re-enables the tests for leaks, switching them to an
eventually to avoid flakes & reducing the threshold to zero.
This switches to using the goleaks package to check for leaks, which
should give us a more complete picture of the particular goroutine
that's leaking, and should avoid issues where we leak a goroutine, but
also stop an old one.

This also force-closes keep-alive connections in the leak tests, since
those look like leaks, but will actually time out after 30s (outside the
timescope of the test).
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 20, 2020
@DirectXMan12
Copy link
Contributor Author

rebased

@vincepri
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2020
@k8s-ci-robot k8s-ci-robot merged commit 011cd8a into kubernetes-sigs:master Aug 20, 2020
@DirectXMan12 DirectXMan12 deleted the feature/leakless-events branch August 20, 2020 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up event handling code goroutine leaks
4 participants