-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GEP-19] Migrate cache Prometheus deployment and configuration #9128
[GEP-19] Migrate cache Prometheus deployment and configuration #9128
Conversation
/assign |
01c88e5
to
220f9a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this greatly structured pull request. I have a few minor questions/comments.
Do you also want to consider adapting
prometheus: |
prometheus-operator
?
I cannot do it yet because the |
220f9a6
to
2d6bf7f
Compare
/lgtm |
LGTM label has been added. Git tree hash: c3238c36df4903574013b01c18e86440f94714f9
|
/assign @oliver-goetz |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR, thanks 🚀
I have some small remarks only.
// A pod selector to select the node-exporter pods in the kube-system namespace does not work here | ||
// because the node-exporter uses the host network. Network policies are currently not supported with | ||
// pods in the host network. | ||
To: []networkingv1.NetworkPolicyPeer{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to restrict the traffic to the nodes network
IP range of the seed then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but I'd like to keep such improvements for a dedicated PR - for now, I only focus on the "translation"/refactoring to the custom resources. I've added a TODO statement to potentially reconsider this.
… component It is shared amongst all Prometheis, so let's manage it centrally.
`prometheus-operator` does not manage the `ServiceAccount` for the created `StatefulSet`
This results in the `StatefulSet` after reconciliation of `prometheus-operator`.
We decided to no longer deploy an `Hvpa` resource since - the HVPA controller is getting removed eventually - it's considered no longer necessary to restrict the downscaling
These rules cannot be directly associated with a component package in `pkg/component`, hence they are considered "general rules" for the cache prometheus.
Some scrape configs might not be translatable into the CRDs (or its beyond the scope of this PR), see next commits, so we need to keep transmitting them in "raw" format. This is possible via the "additional scrape config" configuration option in the `Prometheus` CRD.
This way, we don't need to copy the data from one disk to another but can simply take over the existing PV (claimed by a new PVC). I consciously decided to not invest into unit tests for the new migration.go file since - they wouldn't yield much benefit to increase confidence in a working migration path (if we wanted to have this, we would need to invest into proper e2e tests which is even more costly and complex) - this is temporary code that is about to get deleted soon again - even if there's a bug that we wouldn't have caught during development/testing phase, loosing the Prometheus data is not too bad
2d6bf7f
to
57b2e45
Compare
/lgtm /retest |
LGTM label has been added. Git tree hash: 9bab393b945ebb5bc69376341569f07cb1d32c26
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: oliver-goetz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
How to categorize this PR?
/area dev-productivity monitoring
/kind enhancement
What this PR does / why we need it:
With this PR,
prometheus-operator
(introduced in #9067) takes over management of the cache Prometheus deployment and its configuration. In a nutshell:Prometheus
custom resource is created (which will result in aStatefulSet
)ServiceMonitor
custom resourcesSecret
Which issue(s) this PR fixes:
Part of #9065
Special notes for your reviewer:
/cc @oliver-goetz @ScheererJ
FYI @istvanballok @rickardsjp
Release note:
It is now possible to provide configuration for the cache Prometheus running in seed clusters' `garden` namespaces. Read all about it [here](https://github.com/gardener/gardener/tree/master/docs/extensions/logging-and-monitoring.md#cache-prometheus).