New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
馃尡 Add back error logging for Reconcile implementation #1245
Conversation
I don't understand the rationale behind https://github.com/kubernetes-sigs/controller-runtime/pull/1096/files#diff-3f86588ef108e53ef346d3223ca106ccaa8efe72cb1055489665d7675f3da7b4L242-L244. If the library is going to silently eat all errors, why does the |
The main purpose of the returned error is to cause a retry after an exponential backoff: controller-runtime/pkg/internal/controller/controller.go Lines 263 to 264 in af24f3b
|
From a library consumer's point of view, I would suggest it's completely surprising that if I return an error in this context it will be used for flow control and not actually recorded anywhere. For example, because I (reasonably?) assume the error is recorded by the library, I wrap and propagate the error without recording it myself. Generally, using errors for flow control seems suspicious, especially when the semantics of the concrete error are actually unknown to the caller (i.e. controller-runtime). The library has no idea what an opaque error means. The user already has an explicit way to drive flow control with regards to retries through the Some options:
|
The docs do already mention the retry semantics in the face of an error, but don't mention discarding the error:
|
That is how all Kubernetes controllers work, it would be the same if you built your controller directly with client-go and generated clients+informers. Controller-Runtime is just an abstraction that aims at making building them simpler, it doesn't aim at changing anything about the concepts.
Yes, that is if you know you have to wait a given interval or re-attempt your reconciliation. It also doesn't increase the error counter: controller-runtime/pkg/internal/controller/controller.go Lines 265 to 266 in af24f3b
Only the second suggestion is an option, everything else would mean we derive from how controllers in Kubernetes work.
Using an error to trigger a delayed retry and record it in a metric is not what I would consider "discarding" it. |
Fair enough.
If the actual error and even its message is lost, I would say the most important details have in fact been discarded. If my controller is in a failure loop due to my code propagating some error, the fact I have an opaque error counter doesn't really seem useful. Knowing errors are happening but not what the errors are doesn't seem valuable. All things considered, recording the actual error (or possibly better, giving the user the means to trap and handle the error semantically) seems significant. But it does seem like we agree that logging (at least, in some form) is desirable? |
馃憢 I originally advocated for removing the logging of the reconciler error because it does not include the source (file and line number) where the error was created. It is frustrating, at least for me, to see a log message saying something generic ("error doing xyx") where it is impossible to determine the provenance. Having said that, I can definitely see how having something logged is better than nothing, so I don't have any issue restoring this. I also did some work in Velero a while back to try to extract the file and line information from an error's stack (if you're using something like github.com/pkg/errors) and place that information as values in the log message. It's implementation-specific (Logrus), but it would be nice to investigate if something like this would be possible in the future in a way that worked with our logging interface. See https://github.com/vmware-tanzu/velero/blob/68a4b23722fa66c9fd4147898028d3e16c86a881/pkg/util/logging/error_location_hook.go for the code. |
馃憢 @ncdc Hey! Long time no see! 馃榾 Go has provided error wrapping since 1.13. Does that help address the provenance issue assuming users are encouraged to implement stdlib error wrapping? It doesn't get you a stack trace, but it's certainly a big step forward. |
@ironcladlou good to see you! Hope all is well. Sadly I don't think the error wrapping in 1.13 helps, given the lack of the file and line information. Unless I'm missing something? |
Yeah, the stdlib wrapping does nothing to address file/line info like you'd want from a typical stack trace. However, it does maintain the chain of The implementation specific hacks sound useful to me, but I haven't really spent enough time thinking about it. Not sure if it's in the scope of what's being asked in this PR. As it stands the burden falls to the library consumer to handle logging any error before propagating it out of |
/milestone v0.7.x |
I am happy to re-introduce this and it doesn't sound like there is any objection. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -264,6 +264,7 @@ func (c *Controller) reconcileHandler(ctx context.Context, obj interface{}) { | |||
c.Queue.AddRateLimited(req) | |||
ctrlmetrics.ReconcileErrors.WithLabelValues(c.Name).Inc() | |||
ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, "error").Inc() | |||
c.Log.Error(err, "Reconciler error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.Log.Error(err, "Reconciler error") | |
log.Error(err, "Reconciler error") |
So we get name and namespace information as well
This https://github.com/kubernetes-sigs/controller-runtime/pull/1054/files#diff-3f86588ef108e53ef346d3223ca106ccaa8efe72cb1055489665d7675f3da7b4R242 degaded the log level for the Reconcile implementation. Then this completely remove the error logging https://github.com/kubernetes-sigs/controller-runtime/pull/1096/files#diff-3f86588ef108e53ef346d3223ca106ccaa8efe72cb1055489665d7675f3da7b4L242 When developing new controllers feels counterintuitive that controller runtime eats the error returned by the Reconcile implementation. This proposes to log the error and let developers to create an additional log on the implementation if they choose to.
/lgtm |
This https://github.com/kubernetes-sigs/controller-runtime/pull/1054/files#diff-3f86588ef108e53ef346d3223ca106ccaa8efe72cb1055489665d7675f3da7b4R242 degaded the log level for the Reconcile implementation.
Then this completely remove the error logging https://github.com/kubernetes-sigs/controller-runtime/pull/1096/files#diff-3f86588ef108e53ef346d3223ca106ccaa8efe72cb1055489665d7675f3da7b4L242
When developing new controllers feels counterintuitive that controller runtime eats the error returned by the Reconcile implementation.
This proposes to log the error and let developers to create an additional log in the implementation if they choose to.