Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove deprecated field of manager option #684

Merged
merged 5 commits into from
Sep 1, 2023
Merged

Conversation

aymericDD
Copy link
Contributor

What does this PR do?

  • Adds new functionality
  • Alters existing functionality
  • Fixes a bug
  • Improves documentation or testing

Please briefly describe your changes as well as the motivation behind them:

  • The EventBroadcaster field of ctrl.Options{} has indeed been deprecated since controller-runtime v0.8.0, and it has been replaced by the Recorder interface. It will be removed in the future release. To avoid issues I prefered to removed it to don't be impacted.

Code Quality Checklist

  • The documentation is up to date.
  • My code is sufficiently commented and passes continuous integration checks.
  • I have signed my commit (see Contributing Docs).

Testing

  • I leveraged continuous integration testing
    • by depending on existing unit tests or end-to-end tests.
    • by adding new unit tests or end-to-end tests.
  • I manually tested the following steps:
    • locally.
    • as a canary deployment to a cluster.

@aymericDD aymericDD marked this pull request as ready for review April 4, 2023 15:48
@aymericDD aymericDD requested a review from a team as a code owner April 4, 2023 15:48
ptnapoleon
ptnapoleon previously approved these changes Apr 4, 2023
Copy link
Contributor

@nathantournant nathantournant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the EventBroadcaster field is deprecated because of potential leak/lock issue, the broadcaster is still currently used to create all the chaos-controller events (because it actually hosts the Recorder).

This diagram in the Slack Notifier PR is still valid in that regard and might help visualise how the interfaces are setup.
The event Recorder object we use to create all our events is extracted from the manager. The manager (behind the scenes) recorder object is highly correlated to its broadcaster.

When we r.record() an event, the event is actually sent the recorder of the broadcaster of the manager -- and the broadcaster in question has to be the one we customise with our notifier system and pass as an argument at manager creation. Otherwise I assume the NewManager function creates a default one.

If we remove this line as it is we will simply remove the notifier (notifierSinks + filtering) system. I'll read the updated docs and investigate for the new correct way to implement this !

@nathantournant
Copy link
Contributor

  • What is the broadcaster and why do we set it up there ?

The broadcaster is an interface of the Go kubernetes client that allows you to create custom events from your controller, and send them to the kubernetes cluster. It also allows for custom event filtering, anti-spam filtering, and sink broadcasting (ie. send the event to any interface before/while sending it to the api server).

When we looked for solutions to notify users of specific controller behaviours (eg. NoTargetFound warning event), leveraging the manager EventBroadcaster interface came up as the best, most effective and easy-to-implement solution.

Since then, we have added 4-5 different sinks including slack and datadog, and heavily rely on setting up our own Broadcaster interface to create our events and send them to various sinks. In order to set up our custom EventBroadcaster, we have to create it and give it to the manager at setup, as shown in the code above.

  • Why is the field marked as deprecated ?

The EventBroadcaster is used to send custom events but is also used by the controller for LeaderElection. This is a process where multiple instances of the controller on one cluster could coordinate through events and share load.
This complicates the lifecycle of the EventBroadcaster as it has to be started very early in the manager lifecycle, and killed very late to be able to keep interacting with the LeaderElection process.

In 2020, open source maintainers reacted to an issue describing goroutine leaks related to the EventBroadcaster lifecycle. To ensure correct deletion of the EventBroadcaster with the controller deletion it needs to be instantiated after the manager. A patch PR strongly reduced the possibility for leaks by lazy loading the broadcaster and marked the field as deprecated. The broadcaster is only instantiated when an event is actually emitted – and that’s expected to be done by the LeaderElection process. We do not emit events before calling manager.Start() and should ensure things stay this way.

  • What do we do now ? Is there a better way to set up this option ?

This is a stable issue that has not caused us issues as far as we know.

Reading the patch PR description, as we declare our own custom broadcaster, we should call a defer broadcaster.Stop() in our main function. This will assert the broadcaster is properly stopped by the end of the controller lifecycle.

I did not find a better way to implement our event and notification system within the provided stack, but stay open to possibilities. While we will monitor for goroutine leaks in the future, this works reliably and seems under control.

@aymericDD
Copy link
Contributor Author

Thank @nathantournant for this clear explanation! 👍 For now, we can keep this field and add the defer broadcaster.Stop() function in our main. We should be really careful during dependencies upgrade in case the field is removed.

@ptnapoleon
Copy link
Contributor

ptnapoleon commented Apr 25, 2023

Since then, we have added 4-5 different sinks including slack and datadog, and heavily rely on setting up our own Broadcaster interface to create our events and send them to various sinks. In order to set up our custom EventBroadcaster, we have to create it and give it to the manager at setup, as shown in the code above.

I might be totally misunderstanding, but is the fact that we pass this to the manager even relevant? We pass it separately to the DisruptionReconciler, which is where we actually sent the notifications, and we register the notifier sinks with the broadcaster in main.go.

At least superficially, I would expect everything to keep working if we removed EventBroadcaster: broadcaster from the ctrl.NewManager call, and just modified what we passed to the Reconciler at Recorder: mgr.GetEventRecorderFor(chaosv1beta1.SourceDisruptionComponent),

@nathantournant
Copy link
Contributor

Since then, we have added 4-5 different sinks including slack and datadog, and heavily rely on setting up our own Broadcaster interface to create our events and send them to various sinks. In order to set up our custom EventBroadcaster, we have to create it and give it to the manager at setup, as shown in the code above.

I might be totally misunderstanding, but is the fact that we pass this to the manager even relevant? We pass it separately to the DisruptionReconciler, which is where we actually sent the notifications, and we register the notifier sinks with the broadcaster in main.go.

At least superficially, I would expect everything to keep working if we removed EventBroadcaster: broadcaster from the ctrl.NewManager call, and just modified what we passed to the Reconciler at Recorder: mgr.GetEventRecorderFor(chaosv1beta1.SourceDisruptionComponent),

This is a valid take; the EventBroadcaster does have the ability to generate an EventRecorder, that we can then use in the DisruptionReconciler. It will then be standalone. Concerns are;

  • by removing our EventBroadcaster from the manager, the manager will still create its own default version it needs (for the leader election among else), creating a less-optimal setup (2 broadcasters objects instead of 1)
  • if we want our custom EventBroadcaster to be able to export events to the the cluster (the manager set that up for us) we need to have an additional sink to provide that. That will make us recreate another duplicate resource.
    It's likely those are not massively affecting performance, and we intend to test it out and compare the different performance profiles with @aymericDD. Also how much do we rely on the event reaching kubernetes ? TBD, maybe we can remove that.

Latest commit includes a proposal/PoC that splits the two broadcasters, and makes the custom broadcaster send events to kubernetes. If the resource consumption is not absurd (which is likely), we can clean it up and use that instead.

@ptnapoleon
Copy link
Contributor

by removing our EventBroadcaster from the manager, the manager will still create its own default version it needs (for the leader election among else

good to know. We almost always run w/ leader election off, but i wouldn't want to change how that behavior works without consideratioh

if we want our custom EventBroadcaster to be able to export events to the the cluster (the manager set that up for us) we need to have an additional sink to provide that.

Ahh, I didn't realize that. Great point, thanks!

@@ -42,6 +43,10 @@ func RegisterNotifierSinks(mgr ctrl.Manager, broadcaster record.EventBroadcaster
broadcaster.StartRecordingToSink(&NotifierSink{client: client, notifier: notifier, logger: logger})
}

corev1Client, _ := corev1client.NewForConfig(mgr.GetConfig())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of code can be removed if we don't need to send event to kubernetes.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jun 8, 2023

Datadog Report

Branch report: fix/deprecated
Commit report: 15a8f56

chaos-controller: 0 Failed, 0 New Flaky, 477 Passed, 0 Skipped, 4m 38.67s Wall Time

@aymericDD
Copy link
Contributor Author

What should we do with this pull request? Should I close it and continue using the deprecated method, or should we switch to the new method? @nathantournant @ptnapoleon

@ptnapoleon
Copy link
Contributor

What should we do with this pull request? Should I close it and continue using the deprecated method, or should we switch to the new method? @nathantournant @ptnapoleon

I think it's up to you? You'd be the one doing the change, it doesn't seem urgent, but it is always healthy to move away from deprecated components

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jul 21, 2023

Datadog Report

Branch report: fix/deprecated
Commit report: 5ec250c

chaos-controller: 0 Failed, 0 New Flaky, 487 Passed, 0 Skipped, 6m 46.1s Wall Time

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Aug 17, 2023

Datadog Report

Branch report: fix/deprecated
Commit report: 9e9eb8e

chaos-controller: 0 Failed, 0 New Flaky, 548 Passed, 0 Skipped, 6m 39.61s Wall Time

The EventBroadcaster field of ctrl.Options{} has indeed been deprecated since controller-runtime v0.8.0, and it has been replaced by the Recorder interface.
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Aug 29, 2023

Datadog Report

Branch report: fix/deprecated
Commit report: 96366d6

chaos-controller: 0 Failed, 0 New Flaky, 556 Passed, 0 Skipped, 6m 44.19s Wall Time

@aymericDD aymericDD merged commit 6de2775 into main Sep 1, 2023
18 checks passed
@aymericDD aymericDD deleted the fix/deprecated branch September 1, 2023 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants