Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kyma module status as telemetry input to enable dashboarding and alerting on it #728

Open
2 of 11 tasks
a-thaler opened this issue Jan 18, 2024 · 4 comments
Open
2 of 11 tasks
Assignees
Labels
area/metrics MetricPipeline kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@a-thaler
Copy link
Collaborator

a-thaler commented Jan 18, 2024

Problem
Every module in Kyma must report a status in some way which can be introspected by users. A module already can expose custom metrics on components and mark them with prometheus.io/scrape annotation as scrapable, so that users have a chance to get insights. With that approach, modules can expose advanced metric about the module where users need to know the metrics and be able to define thresholds in order to define alerts. For the not so much "advanced" scenario it will be helpfull to have metrics available which are harmonized across all modules and have a very simple threshold like "error" or "no error". That simple metric should be available if modules do not care yet about metric exposure. The user needs a way to collect these metrics so that he can have a unified dashboard and alert rules defined in his backend

Criterias

  • Have a simple way for users to collect metrics about the kyma status and every module status in a consistent way
  • Have recommendations how to vizualise and alert on these metrics
  • A module provider should have no additional work to do than implementing the module status

Idea
Every module currently must reflect the current state in the moduleCR status by having a "state". It is recommended to also have some more advanced "conditions" with reasons available in the status like for example in telemetry:

  status:
    conditions:
    - lastTransitionTime: "2024-01-18T09:45:25Z"
      message: Fluent Bit DaemonSet is ready
      reason: FluentBitDaemonSetReady
      status: "True"
      type: LogComponentsHealthy
    - lastTransitionTime: "2024-01-17T21:09:22Z"
      message: Trace gateway Deployment is ready
      reason: TraceGatewayDeploymentReady
      status: "True"
      type: TraceComponentsHealthy
    - lastTransitionTime: "2024-01-16T14:44:54Z"
      message: One or more referenced Secrets are missing
      reason: MetricPipelineReferencedSecretMissing
      status: "False"
      type: MetricComponentsHealthy
    state: Warning

Also the state of the module is reflected in the Kyma CR itself as well as the overall kyma state, like shown in the shortened example:

  status:
    activeChannel: fast
    conditions:
    - lastTransitionTime: "2024-01-18T12:22:14Z"
      message: not all modules are in ready state
      reason: Ready
      status: "False"
      type: Modules
    modules:
    - channel: experimental
      fqdn: kyma-project.io/module/telemetry
      name: telemetry
      state: Warning
      version: 1.7.0-dev
      resource:
        apiVersion: operator.kyma-project.io/v1alpha1
        kind: Telemetry
        metadata:
          name: default
          namespace: kyma-system
    state: Warning

To reflect that status information via custom module metrics would require additional effort and an harmonized approach (metric syntax and semantics) across all modules, which will be very hard to achieve.

Instead we could offer a dedicated input to a MetricPipeline which will provide metrics for the kyma state itself and the state of all modules, based on the Kyma CR plus metrics for representing the individual module conditions. The metrics will be gauges with simple values of 0 or 1 for easy alerting. The relation to the used moduleCRs are available via the kyma status already.

An Example PIpeline can look like this:

apiVersion: telemetry.kyma-project.io/v1alpha1
kind: MetricPipeline
metadata:
  name: icke
spec:
  input:
    kyma-system:
      enabled: true

Example metrics can look like that:

kyma_status_state{version="v1beta2", state="running"|"warning"|"error"} = 1
kyma_status_modules_state{version="v1beta2", state="running"|"warning"|"error"} = 1
kyma_telemetry_status_conditions{version="v1alpha1", type="LogComponentsHealthy", reason="Running"} = 1

Items:

@a-thaler a-thaler added kind/feature Categorizes issue or PR as related to a new feature. area/metrics MetricPipeline labels Jan 18, 2024
@a-thaler
Copy link
Collaborator Author

a-thaler commented Jan 18, 2024

A simple test using kube-state-metrics proved that you can emit metrics in a consistent way across all modules.
For that the following kube-state-metrics configuration was used:

customResourceState:
  enabled: true
  config:
    kind: CustomResourceStateMetrics
    spec:
      resources:
      - groupVersionKind:
          group: "operator.kyma-project.io"
          kind: "Kyma"
          version: "v1beta2"
        labelsFromPath:
          name: [metadata, name]
          namespace: [metadata, namespace]
        metrics:
        - name: kyma_status_state
          help: "current state of kyma"
          each:
            type: StateSet
            stateSet:
              labelName: state
              path: [status,state]
              list: [Error, Processing, Ready, Deleting, Warning]
        - name: kyma_status_modules_state
          help: "current module states"
          each:
            type: StateSet
            stateSet:
              labelName: state
              valueFrom: [state]
              path: [status, modules]
              labelsFromPath:
                module: [name]
              list: [Error, Processing, Ready, Deleting, Warning]
      - groupVersionKind:
          group: "operator.kyma-project.io"
          kind: "*"
          version: "*"
        labelsFromPath:
          name: [metadata, name]
          namespace: [metadata, namespace]
        metrics:
        - name: module_status_conditions
          help: "conditions of Module CR"
          each:
            type: Gauge
            gauge:
              path: [status, conditions]
              labelsFromPath:
                type: [type]
                reason: [reason]
              valueFrom: [status]

Running KSM with that config exposed following metrics:

# HELP kube_customresource_module_status_conditions conditions of Module CR
# TYPE kube_customresource_module_status_conditions gauge
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="ApplicationConnector",customresource_version="v1alpha1",name="applicationconnector-sample",namespace="kyma-system",reason="Verified",type="Installed"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="BtpOperator",customresource_version="v1alpha1",name="btpoperator",namespace="kyma-system",reason="ReconcileSucceeded",type="Ready"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Eventing",customresource_version="v1alpha1",name="eventing",namespace="kyma-system",reason="Available",type="NATSAvailable"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Eventing",customresource_version="v1alpha1",name="eventing",namespace="kyma-system",reason="Deployed",type="PublisherProxyReady"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Eventing",customresource_version="v1alpha1",name="eventing",namespace="kyma-system",reason="Ready",type="WebhookReady"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",reason="Verified",type="Installed"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="NATS",customresource_version="v1alpha1",name="eventing-nats",namespace="kyma-system",reason="Available",type="StatefulSet"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="NATS",customresource_version="v1alpha1",name="eventing-nats",namespace="kyma-system",reason="Deployed",type="Available"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Serverless",customresource_version="v1alpha1",name="default",namespace="kyma-system",reason="Configured",type="Configured"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Serverless",customresource_version="v1alpha1",name="default",namespace="kyma-system",reason="Installed",type="Installed"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Telemetry",customresource_version="v1alpha1",name="default",namespace="kyma-system",reason="FluentBitDaemonSetReady",type="LogComponentsHealthy"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Telemetry",customresource_version="v1alpha1",name="default",namespace="kyma-system",reason="MetricPipelineReferencedSecretMissing",type="MetricComponentsHealthy"} 0
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Telemetry",customresource_version="v1alpha1",name="default",namespace="kyma-system",reason="TraceGatewayDeploymentReady",type="TraceComponentsHealthy"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta1",name="default",namespace="kyma-system",reason="Ready",type="ModuleCatalog"} 1
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta1",name="default",namespace="kyma-system",reason="Ready",type="Modules"} 0
kube_customresource_module_status_conditions{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta1",name="default",namespace="kyma-system",reason="Ready",type="SKRWebhook"} 1
# HELP kube_customresource_kyma_status_state current state of kyma
# TYPE kube_customresource_kyma_status_state stateset
kube_customresource_kyma_status_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_kyma_status_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",name="default",namespace="kyma-system",state="Warning"} 1
# HELP kube_customresource_kyma_status_modules_state current module states
# TYPE kube_customresource_kyma_status_modules_state stateset
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="api-gateway",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="api-gateway",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="api-gateway",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="api-gateway",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="api-gateway",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="application-connector",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="application-connector",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="application-connector",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="application-connector",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="application-connector",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="btp-operator",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="btp-operator",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="btp-operator",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="btp-operator",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="btp-operator",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="eventing",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="eventing",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="eventing",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="eventing",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="eventing",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="istio",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="istio",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="istio",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="istio",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="istio",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="keda",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="keda",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="keda",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="keda",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="keda",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="nats",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="nats",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="nats",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="nats",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="nats",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="serverless",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="serverless",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="serverless",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="serverless",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="serverless",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="telemetry",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="telemetry",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="telemetry",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="telemetry",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_kyma_status_modules_state{customresource_group="operator.kyma-project.io",customresource_kind="Kyma",customresource_version="v1beta2",module="telemetry",name="default",namespace="kyma-system",state="Warning"} 1

Hereby, we could use a gauge as well instead of a stateset to not differentiate the states but just have an aggregated error or nor error

A very simple dashboard in Cloud Logging on base of the data:

Screenshot 2024-01-19 at 13 21 46

@a-thaler
Copy link
Collaborator Author

In the otel-collector community the analogue receiver for KSM is the k8sclusterreceiver which has already a good coverage of metrics. However, there is no general solution yet to scrape CRD specific metrics comparable to KSM.
When going with the outlined idea we need to see if we would deploy KSM just for that use case or implement some custom receiver for now. We could start writing a generic receiver for that and try to contribute it as well.

Copy link

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 20, 2024
@a-thaler a-thaler added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 20, 2024
@a-thaler a-thaler self-assigned this Mar 22, 2024
@a-thaler a-thaler changed the title Kyma status as metricpipeline input Observable kyma module status Apr 23, 2024
@a-thaler a-thaler changed the title Observable kyma module status Kyma module status as observability input to enable dashboarding and alerting on it Apr 23, 2024
@a-thaler a-thaler changed the title Kyma module status as observability input to enable dashboarding and alerting on it Kyma module status as telemetry input to enable dashboarding and alerting on it Apr 23, 2024
@chrkl
Copy link
Member

chrkl commented May 8, 2024

The following extension for the MetricPipeline input section was proposed in the developed concept:

apiVersion: telemetry.kyma-project.io/v1alpha1
kind: MetricPipeline
metadata:
  name: sample
spec:
  input:
    kyma:
      enabled: true
      modules:
        - telemetry

Enabling the input should product the following metrics:

kyma.module.status.state with the attributes state and name, which has the value 1 if the module state is Ready
kyma.module.status.condition with the attributes reason, status, name, type, which has the value 1 if the state of the corresponding condition is True.
The name attribute for both of the metrics indicates the module name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics MetricPipeline kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

2 participants