Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: fix missing k8s rest client metrics #26412

Merged
merged 1 commit into from
Jul 5, 2023

Conversation

ysksuzuki
Copy link
Member

@ysksuzuki ysksuzuki commented Jun 22, 2023

K8s rest client metrics, cilium_k8s_client_api_latency_time_seconds and cilium_k8s_client_api_calls_total, are missing because of a conflict with controller-runtime on the metrics registration.
(This issue of missing metrics occurs only in 1.13. See #25555 (comment))

controller-runtime(sigs.k8s.io/controller-runtime/pkg/metrics) also registers the client-go rest client metrics on its registry using metrics.Register method in its init function. The metrics.Register can be called only once, and subsequent calls will be ignored.

cilium-agent doesn't use controller-runtime, but there's an indirect dependency from github.com/cilium/cilium/daemon to github.com/cilium/cilium/operator/metrics package. (operator uses controller-runtime in its Gateway API implementation) For example,
github.com/cilium/cilium/daemon/cmd
-> github.com/cilium/cilium/pkg/ipam
-> github.com/cilium/cilium/pkg/ipam/metrics
-> github.com/cilium/cilium/operator/metrics

The init function of sigs.k8s.io/controller-runtime/pkg/metrics will be called because of this indirect dependency.

This commit fixes this conflict by registering the metrics directly as well as calling metrics.Register.

Fixes: #25610

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

K8s rest client metrics, cilium_k8s_client_api_latency_time_seconds
and cilium_k8s_client_api_calls_total, are missing because of a conflict
with controller-runtime on the metrics registration.

controller-runtime(sigs.k8s.io/controller-runtime/pkg/metrics) also
registers the client-go rest client metrics on its registry using
metrics.Register method in its init function. The metrics.Register can be
called only once, and subsequent calls will be ignored.

cilium-agent doesn't use controller-runtime, but there's an indirect
dependency from github.com/cilium/cilium/daemon to
github.com/cilium/cilium/operator/metrics package.
(operator uses controller-runtime in its Gateway API implementation)
For example,
github.com/cilium/cilium/daemon/cmd
-> github.com/cilium/cilium/pkg/ipam
-> github.com/cilium/cilium/pkg/ipam/metrics
-> github.com/cilium/cilium/operator/metrics

The init function of sigs.k8s.io/controller-runtime/pkg/metrics will be
called because of this indirect dependency.

This commit fixes this conflict by registering the metrics directly
as well as calling metrics.Register.

Fixes: cilium#25610

Signed-off-by: Yusuke Suzuki <yusuke-suzuki@cybozu.co.jp>
@ysksuzuki ysksuzuki requested a review from a team as a code owner June 22, 2023 09:29
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Jun 22, 2023
@ti-mo
Copy link
Contributor

ti-mo commented Jun 23, 2023

Thanks! Since we have no codeowners on release branches, I'll tag some reviewers manually.

Adding @aanm as the original author and @dylandreimerink for recent context around metrics. Please correct as necessary.

@ti-mo ti-mo added area/metrics Impacts statistics / metrics gathering, eg via Prometheus. sig/agent Cilium agent related. labels Jun 23, 2023
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a v1.13 backport it doesn't conflict with the recent metrics changes on main. LGTM!

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. @ysksuzuki I understand that this issue doesn't happen in main, but it can happen if the order of imports changes again. Maybe would be worth forward port this change to the main branch?

@aanm
Copy link
Member

aanm commented Jul 4, 2023

/test-backport-1.13

Job 'Cilium-PR-K8s-1.24-kernel-4.19' failed:

Click to show.

Test Name

K8sAgentPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.24-kernel-4.19/62/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) with L7 policy Tests NodePort with L7 Policy

Failure Output

FAIL: Request from k8s1 to service tftp://[fd04::12]:30390/hello failed

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.22-kernel-4.19/99/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.21-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) Tests NodePort inside cluster (kube-proxy) with IPSec and externalTrafficPolicy=Local

Failure Output

FAIL: Request from k8s1 to service http://[fd04::11]:30508 failed

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.21-kernel-4.19/60/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@ysksuzuki
Copy link
Member Author

ysksuzuki commented Jul 4, 2023

it can happen if the order of imports changes again. Maybe would be worth forward port this change to the main branch?

Right. I will raise a PR to forward port it.

Also, I want to try cleaning up the dependency from ipam to operator package in #26122. There's no way to override the workqueue metrics if controller-runtime wins.

@ysksuzuki
Copy link
Member Author

/test-1.21-4.19

@ysksuzuki
Copy link
Member Author

/test-1.22-4.19

@ysksuzuki
Copy link
Member Author

/test-1.24-4.19

@squeed
Copy link
Contributor

squeed commented Jul 5, 2023

CI green except for, inexplicably, waiting for some main-branch tests. Merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. sig/agent Cilium agent related.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants