Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a non-leader shuts down, leader election runnables are started #2719

Closed
pleshakov opened this issue Mar 20, 2024 · 3 comments
Closed

When a non-leader shuts down, leader election runnables are started #2719

pleshakov opened this issue Mar 20, 2024 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@pleshakov
Copy link

Hi folks,

Noticed the following behavior.

When a non-leader manager shuts down, it starts leader election runnables.

In our project, we see the following in the logs

(this manager is a non-leader)
I0320 19:27:59.274077       7 leaderelection.go:250] attempting to acquire leader lease nginx-gateway/ngf-nginx-gateway-fabric-leader-election...


{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for leader election runnables"}

(Next 5 lines are from the jobs started by leader election runnables)

{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"statusUpdater","msg":"Writing last statuses"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"statusUpdater","msg":"Updating Gateway API statuses"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"statusUpdater","msg":"Updating Nginx Gateway status"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"telemetryJob","msg":"Starting cronjob"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"telemetryJob","msg":"Stopping cronjob"}

{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for HTTP servers"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"controller-runtime.metrics","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"shutting down server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Wait completed, proceeding to shutdown the manager"}

Note that after the start, the contexts passed to leader election runnables is closed shortly after, but there is enough time for those jobs to run. For example, to send an HTTP request.

Looking at the controller runtime code, this behavior occurs because when a runnable group is being StopAndWait, it always calls r.Start.

My understanding of the code is like this:

  1. cm.runnables.LeaderElection.StopAndWait(cm.shutdownCtx)
    - shuts down leader runnable group
  2. always starts runnable group if they have not been started before.

I wonder if this behavior is considered a bug.

Used version:

sigs.k8s.io/controller-runtime v0.17.2
@alvaroaleman
Copy link
Member

/cc @vincepri
Definitely not intended

@vincepri
Copy link
Member

Recently was chatting with @inteon about the same issue, agreed it's not intended behavior; I'll need to take a look next week

@alvaroaleman alvaroaleman added the kind/bug Categorizes issue or PR as related to a bug. label Mar 21, 2024
@sbueringer sbueringer added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Mar 23, 2024
@sbueringer sbueringer added this to the v0.18.x milestone Mar 23, 2024
@pleshakov
Copy link
Author

https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.17.3 and #2724 resolved the issue. We tested it in our implementation. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

4 participants