Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOG-5042: Refactor Collector Alerts/Metrics according to changes in 6.0 version #2484

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vparfonov
Copy link
Contributor

@vparfonov vparfonov commented May 10, 2024

Description

In this PR:

  • removed the alert related to the Fluentd collector: FluentdQueueLengthIncreasing
  • updated metrics configuration to focus on Vector collector, removed dependencies on Fluentd
  • all metrics now used the app_kubernetes_io_instance label, this should help accurate determination of the CLF instance
  • update documentation with actual metrics and alerts

/cc @Clee2691 @cahartma
/assign @jcantrill

/cherry-pick

Links

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 10, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 10, 2024

@vparfonov: This pull request references LOG-5042 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set.

In response to this:

Description

/cc
/assign

/cherry-pick

Links

  • Depending on PR(s):
  • Bugzilla:
  • Github issue:
  • JIRA:
  • Enhancement proposal:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@vparfonov
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 10, 2024

@vparfonov: This pull request references LOG-5042 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set.

In response to this:

Description

In this PR:

  • removed the alert related to the Fluentd collector: FluentdQueueLengthIncreasing
  • updated metrics configuration to focus on Vector collector, removed dependencies on Fluentd
  • all metrics now used the app_kubernetes_io_instance label, this should help accurate determination of the CLF instance

/cc @Clee2691 @cahartma
/assign @jcantrill

/cherry-pick

Links

  • Depending on PR(s):
  • Bugzilla:
  • Github issue:
  • JIRA:
  • Enhancement proposal:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 10, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2024
@vparfonov vparfonov force-pushed the log5042 branch 2 times, most recently from 15d121e to f7c8269 Compare May 10, 2024 13:58
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 10, 2024

@vparfonov: This pull request references LOG-5042 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set.

In response to this:

Description

In this PR:

  • removed the alert related to the Fluentd collector: FluentdQueueLengthIncreasing
  • updated metrics configuration to focus on Vector collector, removed dependencies on Fluentd
  • all metrics now used the app_kubernetes_io_instance label, this should help accurate determination of the CLF instance

/cc @Clee2691 @cahartma
/assign @jcantrill

/cherry-pick

Links

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@jcantrill jcantrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@@ -117,15 +105,12 @@ spec:
severity: Warning
- name: logging_clusterlogging_telemetry.rules
rules:
- expr: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this only a recording rule from the fluent metrics? I thought this was in place to normalize some of our queries so they were collector agnostic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this rule used only for Fluentd, but for Vector we have vector_component_received_bytes_total(24h)

for: 1h
labels:
service: collector
severity: Warning
- alert: ElasticsearchDeprecation
annotations:
message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update this statement since THIS is the release where we are removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i want ask about that

@@ -98,15 +88,16 @@ spec:
severity: Warning
- name: logging_clusterlogging_telemetry.rules
rules:
- expr: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as previous...is this metric only available from fluentd metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -64,24 +64,14 @@
},
"targets": [
{
"expr": "sum by(job, namespace, app_kubernetes_io_name) (increase(vector_component_received_bytes_total{component_kind=\"source\", component_type!=\"internal_metrics\"}[24h]))",
"legendFormat": "{{namespace}}/{{job}}/{{app_kubernetes_io_name}}",
"expr": "sum by(namespace, app_kubernetes_io_instance) (increase(vector_component_received_bytes_total{component_kind=\"source\", component_type!=\"internal_metrics\"}[24h]))",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind me what was the value we added to "..._io_name" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collector type

Copy link
Contributor

openshift-ci bot commented May 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, vparfonov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 13, 2024

@vparfonov: This pull request references LOG-5042 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set.

In response to this:

Description

In this PR:

  • removed the alert related to the Fluentd collector: FluentdQueueLengthIncreasing
  • updated metrics configuration to focus on Vector collector, removed dependencies on Fluentd
  • all metrics now used the app_kubernetes_io_instance label, this should help accurate determination of the CLF instance
  • update documentation with actual metrics and alerts

/cc @Clee2691 @cahartma
/assign @jcantrill

/cherry-pick

Links

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@vparfonov vparfonov force-pushed the log5042 branch 4 times, most recently from 8e3d02a to 68bebf7 Compare May 13, 2024 16:35
@vparfonov
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 13, 2024
@jcantrill
Copy link
Contributor

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 16, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 28, 2024
@vparfonov vparfonov requested a review from jcantrill May 28, 2024 15:46
….0 version

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 28, 2024
@vparfonov
Copy link
Contributor Author

/cc @libander

@openshift-ci openshift-ci bot requested a review from libander May 28, 2024 15:56
Copy link
Contributor

openshift-ci bot commented May 28, 2024

@vparfonov: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 2685ebb link true /test unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants