Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fix app label of metrics svc for ServiceMonitor discovery #229

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

leotomas837
Copy link

@leotomas837 leotomas837 commented Apr 26, 2023

The ServiceMonitor targets both the web hook service and the metrics service. Yet, only the metrics service must be scraped (the probe could be scraped too via the blackbox-exporter, but that is a subject for another PR). The webhook service must not be discovered by Prometheus.

Prerequisites

Having a Kubernetes cluster running with the cert-manager/approver-policy chart installed and a prometheus instance ready to scrape metrics.

How to reproduce

  1. Make your Prometheus instance is scrape your approver-policy metrics. For this purpose, set the following values:
nameOverride: cert-manager-approver-policy

app:
  metrics:
    port: 9402
    service:
      enabled: true
      servicemonitor:
        enabled: true
  1. Open the Prometheus UI, you should see something similar:

Screenshot_2023-04-28_at_06_11_51-3-2

I have 2 replicas, hence the (4/4 up) in the target, but the point is that for a given replica, there is currently 2 services scraped: the webhook Service and the metrics Service. The metrics service should obviously be scraped, but not the webhook service (not even listening on port 9402 but on port 443) as it is not serving any metrics.

That is happening because the ServiceMonitor here is scraping on the following app label:

spec:
  selector:
    matchLabels:
      app: {{ include "cert-manager-approver-policy.name" . }}

Yet, both the metrics Service and the webhook Service have this label, check here and here.

How to test

By changing the app label of the metrics Service and setting the ServiceMonitor to select the new same app label, only the metrics Service gets scraped and you should see the following in your Prometheus UI (2/2 up) for 2 replicas:

Screenshot_2023-04-28_at_06_48_13

Simply apply the following JSON Patches to your chart installed in namespace cert-manager (I am using Helmfile so this is Helmfile syntax, but any tool applying the following JSON Patches obviously works):

jsonPatches:
  # The ServiceMonitor currently targets both the metrics endpoint and the webhook endpoint
  - target:
      group: ""
      version: v1
      kind: Service
      name: cert-manager-approver-policy-metrics
      namespace: cert-manager
    patch:
      - op: replace
        path: "/metadata/labels/app"
        value: cert-manager-approver-policy-metrics
  - target:
      group: monitoring.coreos.com
      version: v1
      kind: ServiceMonitor
      name: cert-manager-approver-policy
      namespace: cert-manager
    patch:
      - op: replace
        path: "/spec/selector/matchLabels/app"
        value: cert-manager-approver-policy-metrics

Signed-off-by: Leo Tomas <tomasl@rcvs.org.uk>
@jetstack-bot jetstack-bot added the dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. label Apr 26, 2023
@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: leotomas837
Once this PR has been reviewed and has the lgtm label, please assign sgtcodfish for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 26, 2023
@jetstack-bot
Copy link
Contributor

Hi @leotomas837. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jetstack-bot jetstack-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 26, 2023
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leotomas837

This looks ok to me, but could you add some instructions to the PR description explaining how to recreate the problem and how to test this improvement.

/ok-to-test

@jetstack-bot jetstack-bot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 26, 2023
@leotomas837
Copy link
Author

@wallrj

Instructions added

@leotomas837
Copy link
Author

@wallrj

Did you get the chance to have a look on this ? I should be straight forward. Let me know if you need anything else from me.

@wallrj
Copy link
Member

wallrj commented May 10, 2023

@leotomas837 Sorry for the delay. After a few failed attempts to get the Prometheus operator installed I finally got it working this morning and was able to recreate the problem you described.

As you can probably tell, I don't know much about Prometheus, but it seemed strange to me that the ServiceMonitor should show the duplicate endpoints, despite us specifying targetPort: 9402. Surely, the prometheus operator should be able to see that only one of the matching services targets that port?

So I tried changing targetPort: 9402 to port: metrics, and that seems to result in only the metrics target appearing in Prometheus

image

$ git diff
diff --git a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
index 43f1751..2e72caa 100644
--- a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
+++ b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
@@ -20,7 +20,7 @@ spec:
     matchNames:
       - {{ .Release.Namespace }}
   endpoints:
-  - targetPort: {{ .Values.app.metrics.port }}
+  - port: metrics
     path: "/metrics"
     interval: {{ .Values.app.metrics.service.servicemonitor.interval }}
     scrapeTimeout: {{ .Values.app.metrics.service.servicemonitor.scrapeTimeout }}

What do you think? Are there any downsides to that approach?

@wallrj
Copy link
Member

wallrj commented May 10, 2023

More discussion here about confusing behaviour of targetPort :

Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we change targetPort to port: metrics, which has the same effect.
It seems to allow prometheus operator servicemonitor controller to select only the services that refer to ports with the name metrics.

@@ -5,7 +5,7 @@ metadata:
name: {{ include "cert-manager-approver-policy.name" . }}-metrics
namespace: {{ .Release.Namespace | quote }}
labels:
app: {{ include "cert-manager-approver-policy.name" . }}
app: {{ include "cert-manager-approver-policy.name" . }}-metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disadvantage of this, is that I can no longer easily discover all the resources related to this app. E.g.

$ kubectl get all --namespace cert-manager --selector app=cer
t-manager-approver-policy
NAME                                               READY   STATUS    RESTARTS   AGE
pod/cert-manager-approver-policy-c68dc44c9-5785f   1/1     Running   0          107m

NAME                                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/cert-manager-approver-policy           ClusterIP   10.96.10.146    <none>        443/TCP    107m
service/cert-manager-approver-policy-metrics   ClusterIP   10.96.133.139   <none>        9402/TCP   107m

NAME                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/cert-manager-approver-policy-c68dc44c9   1         1         1       107m

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wallrj

I was looking at the same GitHub issue, and indeed you are right: targetPort setup the Prometheus's scrape config to scrape the container directly (for each service discovered, hence the duplicated target).
So 2 solutions: tightening the service discovery (my first approach), or targeting the service (by name) directly and not the container (your approach).

Happy to implement your approach, I like to think of a ServiceMonitor to actually be setup with service attributes and not container ones...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Then please modify the PR to use the the port: metrics approach and ping me for another review.

@jetstack-bot jetstack-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2024
@jetstack-bot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants