Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more labels to metrics-server metrics #2176

Closed
hamishforbes opened this issue Jan 17, 2023 · 11 comments · Fixed by #2225
Closed

Add more labels to metrics-server metrics #2176

hamishforbes opened this issue Jan 17, 2023 · 11 comments · Fixed by #2225
Labels
enhancement New feature or request needs triage Requires review from the maintainers

Comments

@hamishforbes
Copy link
Contributor

hamishforbes commented Jan 17, 2023

What would you like added?

The metrics emitted from the metrics-server currently only include job and runs_on labels.

I think repository is required at a minimum for this to be useful, and probably organisation too?

[edit]
Actually workflow_name is probably required as well
Although workflow name has only just been added to the go-github library!
[/edit]

Why is this needed?

If I have 2 Github repos (A and B) and they both have a workflow jobs called test and build (for example) the metrics aren't differentiated, I have no idea how many test jobs ran for repo A vs repo B, only how many ran for all of them.

We would get metrics like:

github_workflow_jobs_completed_total{job_name="test",runs_on="ubuntu-latest"} 123
github_workflow_jobs_completed_total{job_name="build",runs_on="ubuntu-latest"} 123

To be useful we'd want:

github_workflow_jobs_completed_total{organisation="foo",repository="A",job_name="test",runs_on="ubuntu-latest"} 23
github_workflow_jobs_completed_total{organisation="bar,repository="B",job_name="test",runs_on="ubuntu-latest"} 100
github_workflow_jobs_completed_total{organisation="foo",repository="A",job_name="build",runs_on="ubuntu-latest"} 23
github_workflow_jobs_completed_total{organisation="bar",repository="B",job_name="build",runs_on="ubuntu-latest"} 100
@hamishforbes hamishforbes added enhancement New feature or request needs triage Requires review from the maintainers labels Jan 17, 2023
@dongho-jung
Copy link
Contributor

dongho-jung commented Jan 18, 2023

+1 fyi I'm workarounding this by using hook as follows but it would be more awesome if it's supported in metrics-server natively

NOTE: BEFORE USE BELOW SNIPPET, READ THIS COMMENT!

#!/bin/bash -x
kubectl -n actions-runner-system label pod $RUNNER_NAME \
  actor=$(echo $GITHUB_ACTOR | sed 's/[^a-zA-Z0-9_.-]//g') \
  repo=$(echo $GITHUB_REPOSITORY | cut -d/ -f2 | sed 's/[^a-zA-Z0-9_.-]//g') \
  job=$(echo $GITHUB_JOB | sed 's/[^a-zA-Z0-9_.-]//g') \
  runner=$(echo $RUNNER_NAME | sed 's/[^a-zA-Z0-9_.-]//g') \
  sha=$GITHUB_SHA \
  attempt=$GITHUB_RUN_ATTEMPT \
  run_id=$GITHUB_RUN_ID

NOTE: BEFORE USE ABOVE SNIPPET, READ THIS COMMENT!

@timansky
Copy link

it will be great to include branch name also

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 6, 2023

@hamishforbes @0xF4D3C0D3 @timansky Hey! organization and repository have been already added via #2218. We might be adding owner for non-org repos via #2225. That said, can we close this once we add branch and workflow_name?

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 6, 2023

I'm a bit concerned about cardinality explosion #2218 (comment) but perhaps it won't be a huge problem as long as you won't have hundreds of workflows running against hundreds of branches... (which would result in dozens of thousands more time-series generated and stored.

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 6, 2023

@hamishforbes @0xF4D3C0D3 @timansky It would be great if you could also check #2359 by @kkaresz-tw and add 👍, comments, and/or feedback!

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 6, 2023

@0xF4D3C0D3 Hey! Thanks for sharing your workaround. It does look great and helps many folks not yet using a custom build of ARC (as we didn't cut even a new release that contains #2218 yet...)

After carefully reviewing your hook script, I realized you included runner, sha, and run_id of the workflow job. I'm a bit afraid of those resulting in "3 new time series" instead of "3 new metrics in the existing time series". Didn't it cause regressions on your Prometheus or managed metrics store performance/cost/etc?

You may already know, but here's an excerpt from the official Prometheus documentation for your convenience:

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
https://prometheus.io/docs/practices/naming/

@dongho-jung
Copy link
Contributor

@mumoshu Thanks! for letting me know that. I didn't know that risk D: fortunately, I'm in not that big team so I haven't experience any regressions yet, it would be problem later indeed. To be honest, I was using prometheus as a black box or a cargo cult 😳

(I could check the cardinality by go tool pprof -symbolize=remote -inuse_space http://{prometheus_endpoint}/debug/pprof/heap)

@hamishforbes
Copy link
Contributor Author

hamishforbes commented Mar 7, 2023

I'm a bit concerned about cardinality explosion #2218 (comment) but perhaps it won't be a huge problem as long as you won't have hundreds of workflows running against hundreds of branches... (which would result in dozens of thousands more time-series generated and stored.

Sorry haven't had a chance to look at any of this stuff again yet.
But, I don't think there's likely to be much of a cardinality issue with just repo/owner, its not really unbounded as you can only have as many combinations as your have repos.
Branch is more likely to be problematic, which is why I didn't include it in my PR, it is technically unbounded.

Ideally I think these labels would be configurable (kube-state-metrics is a high-profile example of this with --metric-labels-allowlist)
If you are running ARC across thousands of repos and the cardinality of having repo/owner in the labels is a problem then it would be nice to simply turn them off.
Likewise if you are running on a couple of repos and only have a few branches active at once then having branch as a label is probably very helpful

I have branch enable in my custom build too, its definitely good for being able to catch when changes in a branch are causing long build/test runs!

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 13, 2023

Thank you for your comments everyone! I've just merged #2225. It includes workflow_name suggested in #2176 (comment).

It doesn't include branch yet as it turned out go-github does not expose it. I'll try to submit a tiny pull request to add that- once that's merged we are ready to work on adding branch 😄

@kkaresz-tw
Copy link

@mumoshu regarding the missing branch field in WorkflowJob in go-github, I opened a PR: google/go-github#2762 in case you wanted to follow its release.

@th-le
Copy link
Contributor

th-le commented Apr 28, 2023

Hi @mumoshu I open a PR #2549 to use the branch_name in the metrics according to the changes from @kkaresz-tw above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage Requires review from the maintainers
Projects
None yet
6 participants