New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: add events to CephBucketNotification CR #9400
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the codes LGTM. But I don't much internal about events, so another pair of eyes will be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is good. I have some clarifying questions/suggestions:
- Do you also think it's useful to create events about when notifications are removed? If so, maybe we should we until after object: fix handling for notifications for OBC #9365 merges to add those at the same time
- We have a function
ReportIfNotPresent
for events, which can be used to prevent spamming events to the API server. IMO, it is better to use the rawEventf()
function as you have here (I don't think theReportIfNotPresent
code is necessary and really only fixes a symptom and not a problem but I digress.). All this to say, if in your testing you are seeing events get repeatedly added, we might consider using that to spare the K8s apiserver. - I see that this focuses only on the "notification" controller and not OBC label controller. I think that's great, and I assume that there will be a follow-up PR for OBC label controller after?
@@ -159,6 +164,7 @@ func (r *ReconcileNotifications) reconcile(request reconcile.Request) (reconcile | |||
} | |||
if len(obcList.Items) == 0 { | |||
logger.Debugf("no ObjectbucketClaim associated with CephBucketNotification %q", bnName) | |||
r.recorder.Eventf(notification, "Normal", "Reconcile", "Finished reconciling %q", bnName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that using recorder.Eventf()
is easy to read at a glance, but in the interest of creating more of a "contract" about the events that can be used in integration tests, I think it might be useful to use constants where possible. For integration tests, the things I think will be most important are verifying that a reconcile started as a result of a change (to know Rook has started work after an update) and verifying that a reconcile finished (to know when Rook has finished applying changes).
For example, "Normal" is a k8s core v1 api v1.EventTypeNormal
.
In other places, we have used constants here: https://github.com/BlaineEXE/rook/blob/3f766406cb4a0e780a3a02e0e03a5ebb277ed08a/pkg/apis/ceph.rook.io/v1/types.go#L413-L416
IMO, it is worth considering a new event type "ReconcileStarted"
given that we already have the pattern of ReconcileSucceeded
/ReconcileFailed
. It isn't strictly an API, but I think being consistent is good for users who might script around such things similarly to how we might in our integration tests.
Ideally, I'd like to reuse ReportReconcileResult
. This will automatically add events ReconcileSucceeded
or ReconcileFailed
after a reconcile. I don't think we need to add another "Finished reconciling" event unless you see a particular need. Example: https://github.com/BlaineEXE/rook/blob/1af302fe683faaf7a1ed3faac32e28ee2921e858/pkg/operator/ceph/object/controller.go#L176-L177
However, I just realized that ReportReconcileResult
uses the ReportIfNotPresent
function, which means it won't always update when a reconcile finishes twice within 60 minutes. I think I will work on a PR to avoid using ReportIfNotPresent
so we can rely on pre-built behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#9427 <-- always report events PR
@@ -197,8 +203,10 @@ func (r *ReconcileNotifications) reconcile(request reconcile.Request) (reconcile | |||
if err != nil { | |||
return reconcile.Result{}, errors.Wrapf(err, "failed to provision CephBucketNotification %q for ObjectBucketClaims %q", bnName, bucketName) | |||
} | |||
r.recorder.Eventf(notification, "Normal", "Reconcile", "Notification %q provisioned on bucket %q", bnName, bucketName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we might consider a "Provisioned"
event type here instead of reusing "Reconcile"
. Also should probably make this part of the API here: https://github.com/BlaineEXE/rook/blob/3f766406cb4a0e780a3a02e0e03a5ebb277ed08a/pkg/apis/ceph.rook.io/v1/types.go#L401-L426
Here are some events for comparison...
Pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 24s default-scheduler Successfully assigned rook-ceph/rook-ceph-rgw-my-store-a-85cf66b968-44hqh to minikube
Normal Pulled 24s kubelet Container image "quay.io/ceph/ceph:v16.2.6" already present on machine
Normal Created 24s kubelet Created container chown-container-data-dir
Normal Started 24s kubelet Started container chown-container-data-dir
Normal Pulled 23s kubelet Container image "quay.io/ceph/ceph:v16.2.6" already present on machine
Normal Created 23s kubelet Created container rgw
Normal Started 23s kubelet Started container rgw
replication controller :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 11m replicaset-controller Created pod: rook-ceph-rgw-my-store-a-85cf66b968-44hqh
Job:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 22m job-controller Created pod: rook-ceph-osd-prepare-minikube--1-s57vq
Normal Completed 22m job-controller Job completed
yes
agree
currently, events don't report errors/failures, so we probably not going to spam the system
i send events from the OBC label controller as well. see: https://github.com/rook/rook/pull/9400/files#diff-9dd4db96c329ea9c209cd43523aa04e1edc3df69304322939b34dba13598df51R166-R187 |
3ed472b
to
570f09e
Compare
I was hoping we might be able to have a chat in the Rook huddle this morning to go over this with a little more bandwidth. @travisn and I were discussing using events vs status.conditions. We both believe conditions may be worth investigating long term, but those also come with a need to design and maintain the API for the conditions. I think we finally agreed that it makes sense to use events for now since it can get us to a working state more quickly, and we can revisit the topic of status.conditions later. We also want to be careful not to spam the user/resource with events that make it hard to examine the state of the resource at a glance, and we think it's worth planning our strategy beforehand so we don't have too much churn in this PR, dragging it out and adding frustration. For now, let's make sure we can converge on a plan for the events before requesting changes of @yuvalif that might have us going back and forth. My suggestion is to focus first on the events that will help us do the integration tests and consider other events later. For integration tests, I think we only need to be able to know that (1) a reconcile started after changes to a Notification/OBC resource, and (2) that the reconcile then finishes. My suggestion is to use the For integration tests, this would enable the following specific behavior:
|
While the events would give the tests deeper insight into what the operator is doing, why does this test need it when we haven't used it in the past? Sure, it's a more predictable pattern and would help avoid the need for timeouts, let's just be clear why the events are really necessary for these tests. The pattern in the CI is generally: 1) Set the desired state, 2) Wait for the desired state, 3) If the desired state doesn't get applied after a timeout, fail the test. The events are really for user awareness of what is happening. If we are adding events just for the testing, we will likely add some events that are either verbose or uninteresting to the user. |
There are tests for notifications that are a bit different from what we are currently doing. For our current tests, you are right that we currently apply a desired state and wait for the state to change to what we want. We are verifying a positive change. However, for notifications, we also have tests that verify that things do not change when they shouldn't. Like when a label is added to an OBC but the label should not lead to a new notification being created. In this test, we verify a negative, which requires new methods for determining when Rook has finished reconciling where we don't have a positive change to look for. We need a way to verify that Rook is done with changes recently made in order to verify that the notification was unchanged.
Currently, we already have a pattern of sending |
yes, the events added here are, IMO, useful for user awareness
i didn't add the
i did not add a |
Let's discuss in more detail in Monday's huddle. |
added reconcile failure events here: f5acb87 |
This pull request has merge conflicts that must be resolved before it can be merged. @yuvalif please rebase it. https://rook.io/docs/rook/latest/development-flow.html#updating-your-fork |
} | ||
|
||
r.recorder.Eventf(notification, kapiv1.EventTypeNormal, string(cephv1.ReconcileSucceeded), "Successfully reconciled %q", bnName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ReconcileSucceeded
and ReconcileFailed
events are added automatically by the reusable ReportReconcileResult
. See here: https://github.com/BlaineEXE/rook/blob/cab3ce34fd96dd196c7ab1ff4de0c8182b97b1ce/pkg/operator/ceph/object/controller.go#L177
Please use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the case that I return a retry indication without an error? (for the case that the topic is not provisioned yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be an error. Similarly to how a Pod gets a failure event if a Volume source doesn't exist. It helps the user understand what is wrong and correct it if it is their mistake (not creating the Topic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was specifically requested not to be an error in the original PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, what about the case that i failed to fetch the notification object itself?
how can i report an event on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last thing. if we use the EventReporter
i cannot use the FakeRecorder
in the unit test
should I add a FakeReporter
to k8sutil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was specifically requested not to be an error in the original PR
I thought I had asked for it to be an error, but I may have not been clear at the time.
last thing. if we use the
EventReporter
i cannot use theFakeRecorder
in the unit testshould I add a
FakeReporter
to k8sutil?
Have you rebased on the latest master? I removed the deduplicating stuff from ReportReconcileResult
so it should work with the recorder.EventRecorder
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works fine after rebase!
return reconcile.Result{}, errors.Wrapf(err, "failed to provision CephBucketNotification %q for ObjectBucketClaims %q", bnName, bucketName) | ||
} | ||
r.recorder.Eventf(notification, kapiv1.EventTypeNormal, string(cephv1.Provisioned), "Notification %q provisioned on bucket %q", bnName, bucketName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had agreed that adding "Provisioned" events was going to be too much and that notifications should be later put into the status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. will remove
@@ -128,56 +134,69 @@ func (r *ReconcileNotifications) reconcile(request reconcile.Request) (reconcile | |||
return reconcile.Result{}, nil | |||
} | |||
|
|||
namespace := notification.Namespace | |||
bnName := types.NamespacedName{Namespace: namespace, Name: notification.Name} | |||
r.recorder.Eventf(notification, kapiv1.EventTypeNormal, string(cephv1.ReconcileStarted), "Start reconciling %q", bnName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.recorder.Eventf(notification, kapiv1.EventTypeNormal, string(cephv1.ReconcileStarted), "Start reconciling %q", bnName) | |
r.recorder.Eventf(notification, kapiv1.EventTypeNormal, string(cephv1.ReconcileStarted), "Started reconciling CephObjectBucketNotification %q", bnName) |
logger.Infof("CephBucketTopic %q not provisioned yet", topicName) | ||
failureMessage := fmt.Sprintf("Reconcile failed for %s, topic %q not provisioned yet", bnName, topicName) | ||
logger.Info(failureMessage) | ||
r.recorder.Event(notification, kapiv1.EventTypeNormal, string(cephv1.ReconcileFailed), failureMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure events are handled automatically by reading the error message when ReportReconcileResult
is used.
@@ -161,24 +165,32 @@ func (r *ReconcileOBCLabels) reconcile(request reconcile.Request) (reconcile.Res | |||
return reconcile.Result{}, errors.Wrapf(err, "failed to retrieve CephBucketNotification %q", bnName) | |||
} | |||
|
|||
r.recorder.Eventf(notification, kapiv1.EventTypeNormal, string(cephv1.ReconcileStarted), "Start reconciling %q", bnName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For integration tests, it will be important to know when reconcile begins due to OBC changes and when due to Notification changes.
r.recorder.Eventf(notification, kapiv1.EventTypeNormal, string(cephv1.ReconcileStarted), "Start reconciling %q", bnName) | |
r.recorder.Eventf(notification, kapiv1.EventTypeNormal, string(cephv1.ReconcileStarted), "Started reconciling CephObjectBucketNotification %q for ObjectBucketClaim %q", bnName, obcNamespacedName) |
f5acb87
to
2b93a51
Compare
2b93a51
to
4c6f7e4
Compare
the events are used to mark the start/success/failure of the reconciliation and also to log the bucket names on which the CephbucketNotification is created Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
4c6f7e4
to
11727b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good to me. If there are any nits I missed, they'll be easy to patch later.
rgw: add events to CephBucketNotification CR (backport #9400)
the events are used to mark the start/finish of the reconciliation
and also to log the bucket names on which the CephbucketNotification is
created
Signed-off-by: Yuval Lifshitz ylifshit@redhat.com
Checklist:
make codegen
) has been run to update object specifications, if necessary.