[GEP-19] Migrate cache Prometheus deployment and configuration #9128

rfranzke · 2024-02-06T13:19:46Z

How to categorize this PR?

/area dev-productivity monitoring
/kind enhancement

What this PR does / why we need it:
With this PR, prometheus-operator (introduced in #9067) takes over management of the cache Prometheus deployment and its configuration. In a nutshell:

Prometheus custom resource is created (which will result in a StatefulSet)
standard scrape config for services in the seed cluster is provided via ServiceMonitor custom resources
special scrape config (for kubelet/cadvisor) is still provided in "raw format" via the additional scrape config Secret
the PV of an existing cache Prometheus instance will be reused

Which issue(s) this PR fixes:
Part of #9065

Special notes for your reviewer:
/cc @oliver-goetz @ScheererJ
FYI @istvanballok @rickardsjp

Release note:

It is now possible to provide configuration for the cache Prometheus running in seed clusters' `garden` namespaces. Read all about it [here](https://github.com/gardener/gardener/tree/master/docs/extensions/logging-and-monitoring.md#cache-prometheus).

ScheererJ · 2024-02-07T09:37:51Z

/assign

ScheererJ

Thanks a lot for this greatly structured pull request. I have a few minor questions/comments.

Do you also want to consider adapting

gardener/pkg/component/monitoring/charts/bootstrap/values.yaml

Line 1 in ed4931a

prometheus:

to the new reality or do you prefer to remove the file all together once all prometheus instances are migrated to the prometheus-operator?

pkg/component/monitoring/prometheus/prometheus.go

pkg/component/monitoring/prometheus/vpa.go

pkg/component/monitoring/prometheus/component.go

pkg/component/monitoring/prometheus/cache/servicemonitors.go

pkg/gardenlet/controller/seed/seed/reconciler_delete.go

pkg/component/monitoring/prometheus/prometheus.go

rfranzke · 2024-02-07T16:06:22Z

Do you also want to consider adapting

gardener/pkg/component/monitoring/charts/bootstrap/values.yaml

Line 1 in ed4931a

prometheus:

to the new reality or do you prefer to remove the file all together once all prometheus instances are migrated to the prometheus-operator?

I cannot do it yet because the seed-prometheus deployment also refers to these values. I'll do it as soon as this instance has been migrated as well :)

ScheererJ · 2024-02-07T16:18:52Z

/lgtm

gardener-prow · 2024-02-07T16:18:56Z

LGTM label has been added.

Git tree hash: c3238c36df4903574013b01c18e86440f94714f9

rfranzke · 2024-02-08T14:01:07Z

/assign @oliver-goetz

oliver-goetz · 2024-02-08T15:04:30Z

/retest

oliver-goetz

Nice PR, thanks 🚀
I have some small remarks only.

oliver-goetz · 2024-02-08T16:12:13Z

pkg/component/monitoring/prometheus/cache/networkpolicy.go

+				// A pod selector to select the node-exporter pods in the kube-system namespace does not work here
+				// because the node-exporter uses the host network. Network policies are currently not supported with
+				// pods in the host network.
+				To:    []networkingv1.NetworkPolicyPeer{},


Wouldn't it make sense to restrict the traffic to the nodes network IP range of the seed then?

Maybe, but I'd like to keep such improvements for a dedicated PR - for now, I only focus on the "translation"/refactoring to the custom resources. I've added a TODO statement to potentially reconsider this.

pkg/gardenlet/controller/seed/seed/reconciler_reconcile.go

pkg/component/monitoring/prometheus/migration.go

… component It is shared amongst all Prometheis, so let's manage it centrally.

`prometheus-operator` does not manage the `ServiceAccount` for the created `StatefulSet`

`prometheus-operator` does not manage the `Service` for the created `StatefulSet`

This results in the `StatefulSet` after reconciliation of `prometheus-operator`.

We decided to no longer deploy an `Hvpa` resource since - the HVPA controller is getting removed eventually - it's considered no longer necessary to restrict the downscaling

These rules cannot be directly associated with a component package in `pkg/component`, hence they are considered "general rules" for the cache prometheus.

Some scrape configs might not be translatable into the CRDs (or its beyond the scope of this PR), see next commits, so we need to keep transmitting them in "raw" format. This is possible via the "additional scrape config" configuration option in the `Prometheus` CRD.

This way, we don't need to copy the data from one disk to another but can simply take over the existing PV (claimed by a new PVC). I consciously decided to not invest into unit tests for the new migration.go file since - they wouldn't yield much benefit to increase confidence in a working migration path (if we wanted to have this, we would need to invest into proper e2e tests which is even more costly and complex) - this is temporary code that is about to get deleted soon again - even if there's a bug that we wouldn't have caught during development/testing phase, loosing the Prometheus data is not too bad

oliver-goetz · 2024-02-09T09:55:29Z

/lgtm
/approve

/retest

gardener-prow · 2024-02-09T09:55:33Z

LGTM label has been added.

Git tree hash: 9bab393b945ebb5bc69376341569f07cb1d32c26

gardener-prow · 2024-02-09T09:55:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oliver-goetz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [oliver-goetz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gardener-prow bot requested review from oliver-goetz and ScheererJ February 6, 2024 13:19

rfranzke mentioned this pull request Feb 6, 2024

[GEP-19] Migrate monitoring stack to prometheus-operator #9065

Closed

46 tasks

gardener-prow bot assigned ScheererJ Feb 7, 2024

rfranzke force-pushed the gep19/seed-cache-prom branch from 01c88e5 to 220f9a6 Compare February 7, 2024 12:39

ScheererJ reviewed Feb 7, 2024

View reviewed changes

rfranzke force-pushed the gep19/seed-cache-prom branch from 220f9a6 to 2d6bf7f Compare February 7, 2024 16:14

rfranzke requested a review from ScheererJ February 7, 2024 16:14

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2024

gardener-prow bot assigned oliver-goetz Feb 8, 2024

oliver-goetz reviewed Feb 8, 2024

View reviewed changes

rfranzke added 8 commits February 9, 2024 08:54

Manage ClusterRole for prometheus instances in prometheusoperator…

c88dbb7

… component It is shared amongst all Prometheis, so let's manage it centrally.

Introduce prometheus component boilerplate

3a4ee48

ServiceAccount

792c9f5

`prometheus-operator` does not manage the `ServiceAccount` for the created `StatefulSet`

Service

720ecde

`prometheus-operator` does not manage the `Service` for the created `StatefulSet`

ClusterRoleBinding

2bfadf6

Prometheus custom resource

e1a2ea6

This results in the `StatefulSet` after reconciliation of `prometheus-operator`.

VerticalPodAutoscaler

bb9980e

We decided to no longer deploy an `Hvpa` resource since - the HVPA controller is getting removed eventually - it's considered no longer necessary to restrict the downscaling

Replace rules with PrometheusRule resources

cfa8b6b

These rules cannot be directly associated with a component package in `pkg/component`, hence they are considered "general rules" for the cache prometheus.

rfranzke added 13 commits February 9, 2024 08:54

Migrate etcd-druid scrape config into ServiceMonitor

4df3bf0

Migrate hvpa-controller scrape config into ServiceMonitor

d07465a

Migrate kube-state-metrics scrape config into ServiceMonitor

49acb25

Move kubelet scrape config to "additional scrape configs"

270ec62

Move cadvisor scrape config to "additional scrape configs"

3be78ed

Cleanup no longer needed code

8afcf05

Integrate cache Prometheus deployment into seed controller

8b24d1f

Documentation

4690086

Update Prometheis configs for connection to cache Prometheus

b0cac0d

Address PR review feedback

1810d6d

Address PR review feedback

57b2e45

rfranzke force-pushed the gep19/seed-cache-prom branch from 2d6bf7f to 57b2e45 Compare February 9, 2024 08:05

gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 9, 2024

gardener-prow bot requested a review from oliver-goetz February 9, 2024 08:05

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Feb 9, 2024

gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2024

gardener-prow bot merged commit 079171f into gardener:master Feb 9, 2024
16 checks passed

rfranzke deleted the gep19/seed-cache-prom branch February 9, 2024 12:47

This was referenced Feb 9, 2024

[GEP-19] Migrate seed Alertmanager deployment and configuration #9159

Merged

Only wait forPersistentVolume to be Available when .spec.claimRef was removed #9160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GEP-19] Migrate cache Prometheus deployment and configuration #9128

[GEP-19] Migrate cache Prometheus deployment and configuration #9128

rfranzke commented Feb 6, 2024

ScheererJ commented Feb 7, 2024

ScheererJ left a comment

rfranzke commented Feb 7, 2024

ScheererJ commented Feb 7, 2024

gardener-prow bot commented Feb 7, 2024

rfranzke commented Feb 8, 2024

oliver-goetz commented Feb 8, 2024

oliver-goetz left a comment

oliver-goetz Feb 8, 2024

rfranzke Feb 9, 2024 •

edited

oliver-goetz commented Feb 9, 2024

gardener-prow bot commented Feb 9, 2024

gardener-prow bot commented Feb 9, 2024

[GEP-19] Migrate cache Prometheus deployment and configuration #9128

[GEP-19] Migrate cache Prometheus deployment and configuration #9128

Conversation

rfranzke commented Feb 6, 2024

ScheererJ commented Feb 7, 2024

ScheererJ left a comment

Choose a reason for hiding this comment

rfranzke commented Feb 7, 2024

ScheererJ commented Feb 7, 2024

gardener-prow bot commented Feb 7, 2024

rfranzke commented Feb 8, 2024

oliver-goetz commented Feb 8, 2024

oliver-goetz left a comment

Choose a reason for hiding this comment

oliver-goetz Feb 8, 2024

Choose a reason for hiding this comment

rfranzke Feb 9, 2024 • edited

Choose a reason for hiding this comment

oliver-goetz commented Feb 9, 2024

gardener-prow bot commented Feb 9, 2024

gardener-prow bot commented Feb 9, 2024

rfranzke Feb 9, 2024 •

edited