-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Bump Prometheus-operator and node_exporter to latest versions #1661
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: lilic The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@lilic: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The PR is marked as WIP, let me know when it's ready for review. |
@mm4tt Thanks for reply! Just wanted to first get initial feedback if this is something you all want? If yes, can you maybe addressed the comments in the description. After that I can address them and unmark it as WIP. Hope that makes sense? :) Thank you! |
- --kubelet-service=kube-system/kubelet | ||
{{end}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like some templating was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added back in, indeed, the variable is declared at the top 👍
I took a cursory glance. A few notes:
It'd be great if you can update the prometheus operator and dependencies to the newest version. It's something that we'll have to do eventually anyway. But, could you keep the change as minimalistic as possible? I see that you changed/added a lot of things, e.g. a lot of new grafana dashboards, that aren't really needed. Also, please make sure to honor our custom overrides and settings (e.g. the thing Pawel caught). Thanks! |
verbs: | ||
- get | ||
- list | ||
- watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we can remove this?
If I am not mistaken when we are using service/pod monitors prometheus is doing list and watch for these two objects, right?
{{if $PROMETHEUS_SCRAPE_WINDOWS_NODES}} | ||
additionalScrapeConfigs: | ||
name: windows-scrape-configs | ||
key: windows-scrape-configs.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove whitespaces
{{end}} | ||
image: gcr.io/k8s-testimages/quay.io/coreos/prometheus-operator:v0.31.0 | ||
- --prometheus-config-reloader=gcr.io/k8s-testimages/quay.io/prometheus-operator/prometheus-config-reloader:v0.45.0 | ||
image: gcr.io/k8s-testimages/quay.io/prometheus-operator/prometheus-operator:v0.45.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR these images do not exist right now, we will need to push them manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, i see. Can you advise how we can get this image updated?
- --secure-listen-address=:8443 | ||
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 | ||
- --upstream=http://127.0.0.1:8080/ | ||
image: quay.io/brancz/kube-rbac-proxy:v0.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what this container is doing, but either way if we want to use it then we will need to mirror it to gcr (in the past we had issues, where our tests were failing, because quay.io was down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: There are also talks in kubebuilder to make automatic cloning of kube-rbac-proxy. Plus we (kube-rbac-proxy maintainers) want to donate that to k8s in the future when we find time to actually do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems 0.8.0 is available as per kubernetes-sigs/kubebuilder#1955, updating this PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marseel: I am new to this ecosystem, but can gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
be referenced her as well or can you advise how I can help in having a gcr.io/k8s-testimages/quay.io/brancz/kube-rbac-proxy:v0.8.0
?
tldr: kube-rbac-proxy is protects endpoints by executing a token and subject access review against k8s api to validate if service account token in flight has permissions to query metrics.
@lilic: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thank you all!! Just heads up I did not forget about this :) |
side note: to spread up the work load, I am taking this over from @lilic :-) |
@mm4tt @marseel let me actually submit separate PRs for the suggested changes so it is easier to track 👍 side note: i am also prometheus-operator and kube-prometheus maintainer, hence it might look like big changes here but they were more or less taken 1:1 from kube-prometheus upstream which evolved gradually. re: grafana dashboards: do you use explicit custom grafana dashboards? Otherwise we could indeed vendor them 1:1 to catch up with upstream changes from kube-prometheus. |
Yes, we have our own grafana dashboards defined in https://github.com/kubernetes/perf-tests/tree/97fedf5c484e6438c2c13854bb901241493377ba/clusterloader2/pkg/prometheus/manifests/dashboards |
Thanks Serg 🎉 |
What type of PR is this?
/kind feature
What this PR does / why we need it:
While writing kube-state-metrics perf tests, I noticed that the version of Prometheus operator and node_exporter are out of date, so I decided to contribute. Let me know if I overstepped the line, happy to close the PR!
I tried to keep the overriding changes as close to as they were, let me know if I missed something important?
I also added node_exporter DaemonSet manifest instead of the pod one, but happy to revert the changes if you had a reason for one pod!
And switched replicas of Prometheus to 2, but since I do not know the full purpose of this Prometheus instance, maybe having 1 replica is enough in your case? (e.g. the trafeoffs of consuming less memory in a scale and perf test vs availability of Prometheus monitoring might be indeed a normal thing)
Special notes for your reviewer:
(Note I am the maintainer of both Prometheus operator and kube-prometheus which the original manifests are based on)
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: