Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Controller: Wait for all reconciliations before shutting down #1427

Merged
merged 1 commit into from Mar 14, 2021

Conversation

alvaroaleman
Copy link
Member

Currently, the controller will instantly shutdown and return when its
context gets cancelled, leaving active reconciliations be. This change
makes it wait for those before finishing shutdown.

Fixes #1424

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 14, 2021
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 14, 2021
Currently, the controller will instantly shutdown and return when its
context gets cancelled, leaving active reconciliations be. This change
makes it wait for those before finishing shutdown.
@alvaroaleman alvaroaleman changed the title :bug Controller: Wait for all reconciliations before shutting down 🐛 Controller: Wait for all reconciliations before shutting down Mar 14, 2021
for i := 0; i < c.MaxConcurrentReconciles; i++ {
go wait.UntilWithContext(ctx, func(ctx context.Context) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, the wait.UntilWithContext was harmless, but not useful because it would start the closure every c.JitterPeriod for as long as the ctx is not cancelled. The closure will however stay in the for c.processNextWorkItem(ctx) loop until the workqueue is shutdown and the workqueue is only shutdown when the ctx is cancelled.

I think we still have this because before b4a0212 it was possible for the closure to end.

With the changes in this PR, using wait.UntilWithContext may lead to a deadlock:

  • We add the workers to the WaitGroup
  • Context gets cancelled before the closure is started
  • wait.UntilWithContext will not start the closure so nothing will ever call wg.Done()
  • The wg.Wait is deadlocked

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes lots of sense, and it'd actually explain why I was still noticing some errors during tests that wouldn't properly clean up or reconcile everything before shutting stuff down (including the event recorder, which panics).

@alvaroaleman
Copy link
Member Author

/assign @vincepri @estroz

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alvaroaleman,vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vincepri
Copy link
Member

/milestone v0.9.x

@k8s-ci-robot k8s-ci-robot added this to the v0.9.x milestone Mar 14, 2021
@vincepri
Copy link
Member

/retest

1 similar comment
@vincepri
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot merged commit df2c43d into kubernetes-sigs:master Mar 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graceful shutdown doesn't wait for reconcile workers to complete
4 participants