Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: always report events #9427

Merged
merged 1 commit into from Dec 20, 2021

Conversation

BlaineEXE
Copy link
Member

The original PR which added event reporting unnecessarily "optimized" to
prevent spamming the API controller1.

It is sometimes important to get events as they happen and not hide new
events behind preexisting older events. For example, in integration
tests, we may often want to wait for a controller to finish processing
an update, and the best way to do that is to wait for the
"ReconcileSucceeded" event. In order for this to be useful, the events
must be reported each time.

If we begin having problems with events being reported too often, then
we should fix the underlying issue of reconciles happening too often
instead of relying on a time-based "optimization" that hides recent
event reports that may be useful.

Signed-off-by: Blaine Gardner blaine.gardner@redhat.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

@@ -51,7 +51,7 @@ func ReportReconcileResult(logger *capnslog.PackageLogger, recorder *k8sutil.Eve
logger.Errorf("failed to reconcile %s %q. %v", kind, nsName, err)

// 2. event
recorder.ReportIfNotPresent(obj, corev1.EventTypeWarning, string(cephv1.ReconcileFailed), err.Error())
recorder.Event(obj, corev1.EventTypeWarning, string(cephv1.ReconcileFailed), err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is happening on every retry, maybe we should keep the ReportIfNotPresent() call?

Copy link
Member Author

@BlaineEXE BlaineEXE Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple events, they will be collated together like so, with something similar to (x4 over 8m46s) indicating that they have happened multiple times and without adding loads of clutter:

  Warning  ReconcileFailed     8m11s               rook-ceph-object-controller  failed to create object store deployments: fail
ed to start rgw health checker for CephObjectStore "rook-ceph/my-store", will re-reconcile: failed to create bucket checker for
 CephObjectStore "rook-ceph/my-store": failed to create or retrieve rgw admin ops user: failed to create object user "rgw-admin
-ops-user". error code 1 for object store "my-store": failed to create s3 user. . : signal: interrupt
  Normal   ReconcileSucceeded  9s (x4 over 8m46s)  rook-ceph-object-controller  successfully configured CephObjectStore "rook-c
eph/my-store"

IMO, that is an okay trade-off for getting more insight into when things have happened most recently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok .so this is happening regardless of whether we use ReportIfNotPresent or Event?
I guess we can just rely on this mechanism instead of adding another one.

The original PR which added event reporting unnecessarily "optimized" to
prevent spamming the API controller[1].

It is sometimes important to get events as they happen and not hide new
events behind preexisting older events. For example, in integration
tests, we may often want to wait for a controller to finish processing
an update, and the best way to do that is to wait for the
"ReconcileSucceeded" event. In order for this to be useful, the events
must be reported each time.

If we begin having problems with events being reported too often, then
we should fix the underlying issue of reconciles happening too often
instead of relying on a time-based "optimization" that hides recent
event reports that may be useful.

[1]: rook#7222

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
@BlaineEXE BlaineEXE force-pushed the always-report-reconcile-events branch from 59259d3 to da61ac1 Compare December 15, 2021 22:24
@BlaineEXE BlaineEXE marked this pull request as ready for review December 16, 2021 22:11
@BlaineEXE BlaineEXE merged commit 9dc41f5 into rook:master Dec 20, 2021
@BlaineEXE BlaineEXE deleted the always-report-reconcile-events branch December 20, 2021 15:42
mergify bot added a commit that referenced this pull request Dec 20, 2021
operator: always report events (backport #9427)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants