Skip to content

Monitors kafka lag and publishes the metrics to different metrics backends

License

Notifications You must be signed in to change notification settings

devatherock/kafka-lag-monitor

Repository files navigation

CircleCI Coverage Status Quality Gate Docker Pulls Lines of Code Docker Image Size

kafka-lag-monitor

Monitors kafka lag and publishes the metrics to different metrics backends

Metrics

The supported metrics backends are Prometheus and InfluxDB

Sample metrics

Prometheus:

The metrics in Prometheus format can be accessed at /prometheus endpoint

# HELP kafka_consumer_lag_max  
# TYPE kafka_consumer_lag_max gauge
kafka_consumer_lag_max{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 2.0
kafka_consumer_lag_max{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 2.0
# HELP kafka_consumer_lag  
# TYPE kafka_consumer_lag summary
kafka_consumer_lag_count{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 1.0
kafka_consumer_lag_sum{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 2.0
kafka_consumer_lag_count{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 1.0
kafka_consumer_lag_sum{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 2.0
# HELP kafka_consumer_offset  
# TYPE kafka_consumer_offset summary
kafka_consumer_offset_count{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 1.0
kafka_consumer_offset_sum{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 16.0
kafka_consumer_offset_count{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 1.0
kafka_consumer_offset_sum{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 13.0
# HELP kafka_consumer_offset_max  
# TYPE kafka_consumer_offset_max gauge
kafka_consumer_offset_max{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 16.0
kafka_consumer_offset_max{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 13.0
# HELP kafka_partition_offset  
# TYPE kafka_partition_offset summary
kafka_partition_offset_count{cluster_name="test-cluster",partition="1",topic="test-topic",} 1.0
kafka_partition_offset_sum{cluster_name="test-cluster",partition="1",topic="test-topic",} 18.0
kafka_partition_offset_count{cluster_name="test-cluster",partition="0",topic="test-topic",} 1.0
kafka_partition_offset_sum{cluster_name="test-cluster",partition="0",topic="test-topic",} 15.0
# HELP kafka_partition_offset_max  
# TYPE kafka_partition_offset_max gauge
kafka_partition_offset_max{cluster_name="test-cluster",partition="1",topic="test-topic",} 18.0
kafka_partition_offset_max{cluster_name="test-cluster",partition="0",topic="test-topic",} 15.0

Influxdb:

Metrics in InfluxDB's line protocol format will be reported by default to http://localhost:8086/write endpoint, every minute

kafka_consumer_lag,cluster_name=test-cluster,group=test-consumer,partition=0,topic=test-topic,metric_type=histogram sum=2,count=1,mean=2,upper=2 1612125711313
kafka_consumer_lag,cluster_name=test-cluster,group=test-consumer,partition=1,topic=test-topic,metric_type=histogram sum=2,count=1,mean=2,upper=2 1612125711311
kafka_consumer_offset,cluster_name=test-cluster,group=test-consumer,partition=0,topic=test-topic,metric_type=histogram sum=13,count=1,mean=13,upper=13 1612125711307
kafka_consumer_offset,cluster_name=test-cluster,group=test-consumer,partition=1,topic=test-topic,metric_type=histogram sum=16,count=1,mean=16,upper=16 1612125711308
kafka_partition_offset,cluster_name=test-cluster,partition=0,topic=test-topic,metric_type=histogram sum=15,count=1,mean=15,upper=15 1612125711311
kafka_partition_offset,cluster_name=test-cluster,partition=1,topic=test-topic,metric_type=histogram sum=18,count=1,mean=18,upper=18 1612125711313

Usage

docker run --rm \
        -p 8080:8080  \
        -v /path/to/config:/config \
        -e MICRONAUT_CONFIG_FILES=/config/application.yml \
        -e MICRONAUT_METRICS_EXPORT_INFLUX_ENABLED=false \
        devatherock/kafka-lag-monitor:latest

Configurable properties

application.yml variables

kafka:
  clusters: # Required. A list of kafka cluster definitions
    - name: test-cluster # Required. Name of the cluster. The same name will be needed in `kafka.lag-monitor.clusters[*].name` config. 
      servers: test-cluster.test.com:9092 # Required. The server(s)/broker(s) that belong to this cluster
  lag-monitor:
    clusters:
      - name: test-cluster # Required. Name of the cluster to monitor. Should be one of the defined `kafka.clusters[*].name`
        consumer-groups: # Optional. List of consumer group names to monitor. Names will be matched exactly. Use `group-allowlist` for regex match
          - test-consumer
        group-allowlist: # Optional. List of regular expressions to match against consumer group names to monitor. Will be ignored if `consumer-groups` is specified
          - deva.*
        group-denylist: # Optional. List of regular expressions to match against consumer group names to exclude. Will be ignored if `consumer-groups` or `group-allowlist` is specified
          - temp.*
    threadpool-size: 5 # Optional. Size of the thread pool used by the lag monitor. Defaults to 5
    timeout-seconds: 5 # Optional. Timeout for the requests to Kafka, in seconds. Defaults to 5
    initial-delay-seconds: 60 # Optional. Initial delay before metric collection begins, in seconds. Defaults to 60
    interval-seconds: 60 # Optional. Metric collection interval, in seconds. Defaults to 60
micronaut:
  server:
    port: 8080 # Optional. Port in which the app listens on
  metrics:
    export:
      influx: # Config for publishing metrics to Influxdb
        enabled: false # Optional. Indicates if metrics reporting to Influxdb is enabled. Defaults to true
        uri: https://some.influx.host # Optional. The HTTP endpoint exposed by Influxdb, to which to report metrics. Defaults to http://localhost:8086

Environment variables

Environment Variable Name Required Default Description
KAFKA_LAG_MONITOR_THREADPOOL_SIZE false 5 Size of the thread pool used by the lag monitor
KAFKA_LAG_MONITOR_TIMEOUT_SECONDS false 5 Timeout for the requests to Kafka, in seconds
LOGGER_LEVELS_ROOT false INFO SLF4J log level, for all(framework and custom) code
LOGGER_LEVELS_IO_GITHUB_DEVATHEROCK false INFO SLF4J log level, for custom code
MICRONAUT_SERVER_PORT false 8080 Port in which the app listens on
MICRONAUT_CONFIG_FILES true (None) Path to YAML config files. The YAML files can be used to specify complex, object and array properties
MICRONAUT_METRICS_EXPORT_INFLUX_ENABLED false true Indicates if metrics reporting to Influxdb is enabled
MICRONAUT_METRICS_EXPORT_INFLUX_URI false http://localhost:8086 The HTTP endpoint exposed by Influxdb, to which to report metrics
LOGBACK_CONFIGURATION_FILE false (None) Path to logback configuration file

Troubleshooting

Enabling debug logs

  • Set the environment variable LOGGER_LEVELS_ROOT to DEBUG to enable all debug logs - custom and framework
  • Set the environment variable LOGGER_LEVELS_IO_GITHUB_DEVATHEROCK to DEBUG to enable debug logs only in custom code
  • For fine-grained logging control, supply a custom logback.xml file and set the environment variable LOGBACK_CONFIGURATION_FILE to /path/to/custom/logback.xml

JSON logs

To output logs as JSON, set the environment variable LOGBACK_CONFIGURATION_FILE to logback-json.xml. Refer logstash-logback-encoder documentation to customize the field names and formats in the log

About

Monitors kafka lag and publishes the metrics to different metrics backends

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages