-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOG-5426: Fix stale metrics in telemetry #2443
LOG-5426: Fix stale metrics in telemetry #2443
Conversation
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Took another stab at some of the questions today and updated the PR code and description to reflect that. |
/retest-required |
1 similar comment
/retest-required |
Had a short chat about this topic with @cahartma . There are still some open points which I need to address, so this PR will change again on Monday. |
/hold |
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
cf0e427
to
22fd218
Compare
@xperimental: This pull request references LOG-5426 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Ready for review again, now already squashed as discussed 馃檪 |
It looks to me as if builds fail because of the recent change to Go 1.21 . There's already a PR to fix that issue, so I'll probably need a rebase after that one merges. 馃 |
/approve |
"github.com/openshift/cluster-logging-operator/internal/constants" | ||
) | ||
|
||
func UpdateInfofromCLF(forwarder logging.ClusterLogForwarder) { | ||
func (t *telemetryCollector) updateDefaultInfo(forwarder *loggingv1.ClusterLogForwarder) { | ||
if forwarder.Namespace != constants.OpenshiftNS || forwarder.Name != constants.SingletonName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its a step after this but seems like we would be interested in telemetry for all CLFs not just the legacy instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code provides metrics for all the CLF instances.
This function is a separate codepath for the "default/legacy instance", because it might not even exist (in the case when a user has ClusterLogging without any customization in ClusterLogForwarder). So this function collects the final state of the "legacy instance" to expose it in metrics if the ClusterLogging instance exists while for all other CLF instances the information is directly gathered from the resource (code here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this again, I might have forgotten to describe this in more detail in a comment. I can add that to this function, if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be reasonable to clarify though I'm the only one asking so maybe limited value. This is not a blocking issue for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an explanation comment to the function. 鉁旓笍
22fd218
to
c4669b3
Compare
c4669b3
to
e93eef8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alanconway, jcantrill, xperimental The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
@xperimental: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
0eccb99
into
openshift:release-5.9
Description
This PR aims to fix the issue of stale telemetry metrics present in the operator. It does this by creating a custom collector instead of relying on the standard instrumentation primitives (like
GaugeVec
), which keep the previous state of the metrics.No changes are done to the metrics themselves (except for the help text) to avoid incompatibilities with previous telemetry queries.
/cc @cahartma
/assign @alanconway
Known issues / ToDo
log_forwarder_output_info
metric.List
call is essentially free (the list is cached locally). This would also solve the "stale metrics after delete" issue. The issue I have with implementing this approach in CLO is that I don't know how to identify the "healthy" status, for example. Input appreciated.Links