Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Write enablement #6470

Open
07Rajat opened this issue Apr 4, 2024 · 6 comments
Open

Remote Write enablement #6470

07Rajat opened this issue Apr 4, 2024 · 6 comments

Comments

@07Rajat
Copy link

07Rajat commented Apr 4, 2024

What happened?

Description

We are looking for a solution on remote write feature enablement.

In our case, We have multiple openshift clusters and we are trying to centralize these under one grafana dashboard.

image

In above image we could see there are 2 clusters, cluster 1 and cluster2 where we have prometheus installed in different namespaces. customized prometheus-operator installed in one namespace and another comes default with openshift itself and which is present under openshift-monitoring namespace.

Here, we are trying to remote_write the date from default prometheus from openshift-monitoring to customized promethues server.

In customized prometheus, promethus installed via prometheus as a separate prometheus object and exposing the prometheus service.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi

apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: NodePort
  ports:
  - name: web
    nodePort: 30900
    port: 9090
    protocol: TCP
    targetPort: web
  selector:
    prometheus: prometheus

https://blog.container-solutions.com/prometheus-operator-beginners-guide

https://grafana.com/blog/2023/01/19/how-to-monitor-kubernetes-clusters-with-the-prometheus-operator/

Here, we are trying to customize the prometheus yaml configuration however it is not allowing us to change or modify anything in the statefulset which generates post deployment of prometheus.

we are looking for an option where we could add the remote write configuration as configmap and mount that as volume in customize prometheus configuration.

Configmap for reference :

kind: ConfigMap
apiVersion: v1
metadata:
  name: cluster-monitoring-config
  namespace: test
  labels:
    hive.openshift.io/managed: 'true'
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusK8s:
      remoteWrite:
      - url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091/api/v1/write
        oauth2:
          clientId:
            secret:
              key: client-id
              name: observatorium-credentials
          clientSecret:
            key: client-secret
            name: observatorium-credentials
          tokenUrl: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
        remoteTimeout: 30s
        writeRelabelConfigs:
        - sourceLabels:
          - __name__
          action: keep
          regex: (addon_operator_addons_count|addon_operator_reconcile_error|addon_operator_addon_health_info|addon_operator_ocm_api_requests_durations|addon_operator_ocm_api_requests_durations_sum|addon_operator_ocm_api_requests_durations_count|addon_operator_paused|cluster_admin_enabled|limited_support_enabled|identity_provider|cpms_enabled|ingress_canary_route_reachable|ocm_agent_service_log_sent_total|sre:slo:probe_success_api|sre:slo:probe_success_console|sre:slo:upgradeoperator_upgrade_result|sre:slo:imageregistry_http_requests_total|sre:slo:oauth_server_requests_total|sre:sla:outage_5_minutes|sre:slo:apiserver_28d_slo|sre:slo:console_28d_slo|sre:error_budget_burn:apiserver_28d_slo|sre:error_budget_burn:console_28d_slo|sre:operators:succeeded)
        queueConfig:
          capacity: 2500
          maxShards: 1000
          minShards: 1
          maxSamplesPerSend: 2000
          batchSendDeadline: 60s
          minBackoff: 30ms
          maxBackoff: 1m
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
      retention: 11d
      retentionSize: 90GB
      volumeClaimTemplate:
        metadata:
          name: prometheus-data
        spec:
          resources:
            requests:
              storage: 100Gi
    alertmanagerMain:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
      volumeClaimTemplate:
        metadata:
          name: alertmanager-data
        spec:
          resources:
            requests:
              storage: 10Gi
    telemeterClient:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
      telemeterServerURL: https://infogw.api.openshift.com
    prometheusOperator:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    grafana:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    k8sPrometheusAdapter:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    kubeStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    openshiftStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    thanosQuerier:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    monitoringPlugin:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists

Really appreciated your suggestions and support.

Prometheus Operator Version

openshiftVersion: 4.13.29
kustomizeVersion: v4.5.4

Kubernetes Version

openshiftVersion: 4.13.29
kustomizeVersion: v4.5.4

Kubernetes Cluster Type

OpenShift

How did you deploy Prometheus-Operator?

Other (please comment)

Manifests

No response

prometheus-operator log output

Prometheus Operator 0.56.3 provided by Craig Trought

Anything else?

No response

@07Rajat 07Rajat added kind/support needs-triage Issues that haven't been triaged yet labels Apr 4, 2024
@mviswanathsai
Copy link
Contributor

mviswanathsai commented Apr 4, 2024

If you are running Prometheus-Operator, you can specify the remote write config in the corresponding Promtheus CR itself. See here. Unless you have a very specific reason to use the config map, maybe this will help?

@mviswanathsai
Copy link
Contributor

In case you do want to use the config map, something like additionalScrapeConfigs lets you write your own config (incase some fields are not yet supported) in a secret and reference it in the Prometheus CR.

@simonpasquier
Copy link
Contributor

I'm not sure to understand your issue. The right way to configure the OCP Prometheus is via the CMO configmap though I'm not sure why you have https://thanos-querier.openshift-monitoring.svc.cluster.local:9091/api/v1/write as the remote-write endpoint.

@07Rajat
Copy link
Author

07Rajat commented Apr 8, 2024

Hi @mviswanathsai , @simonpasquier
For prometheus deployment

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus-operator
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi

And to add on this for remote_write url

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus-operator
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi
  remoteWrite:
    - url: https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091/api/v1/write
      oauth2:
        clientId:
          secret:
            key: client-id
            name: observatorium-credentials
        clientSecret:
          key: client-secret
          name: observatorium-credentials
        tokenUrl: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
      remoteTimeout: 30s
      writeRelabelConfigs:
      - sourceLabels:
        - __name__
        action: keep
        regex: (addon_operator_addons_count|addon_operator_reconcile_error|addon_operator_addon_health_info|addon_operator_ocm_api_requests_durations|addon_operator_ocm_api_requests_durations_sum|addon_operator_ocm_api_requests_durations_count|addon_operator_paused|cluster_admin_enabled|limited_support_enabled|identity_provider|cpms_enabled|ingress_canary_route_reachable|ocm_agent_service_log_sent_total|sre:slo:probe_success_api|sre:slo:probe_success_console|sre:slo:upgradeoperator_upgrade_result|sre:slo:imageregistry_http_requests_total|sre:slo:oauth_server_requests_total|sre:sla:outage_5_minutes|sre:slo:apiserver_28d_slo|sre:slo:console_28d_slo|sre:error_budget_burn:apiserver_28d_slo|sre:error_budget_burn:console_28d_slo|sre:operators:succeeded)

seems prometheus object is restricting to create the statefulset with remote_write option, could you please suggest

@simonpasquier
Copy link
Contributor

seems prometheus object is restricting to create the statefulset with remote_write option, could you please suggest

There's no such restriction. I would check the status field of the Prometheus object and the prometheus-operator logs.

Prometheus Operator 0.56.3

This is a very old version. I'd advise to upgrade.

@07Rajat
Copy link
Author

07Rajat commented Apr 10, 2024

Prometheus Operator 0.56.3

This is a very old version. I'd advise to upgrade.

Thanks for the advice @simonpasquier but I believe this is the latest prometheus operator version which is available in the Redhat Marketplace
image

@simonpasquier simonpasquier removed the needs-triage Issues that haven't been triaged yet label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants