Skip to content

cicerops/monitoring-check-grafana

Repository files navigation

image

image

image

monitoring-check-grafana

About

A monitoring sensor for checking a Grafana datasource against data becoming stale. This will let you detect data loss or other dropout conditions of feeds into your datasources.

Goals

Tired of operations or engineering messing with the datasources or the sensors under the hood you are just watching in Grafana?

This plugin is an attempt to have a basic end-to-end monitoring probe covering the whole track of data flowing from arbitrary sensors into a (timeseries) database and then being displayed in Grafana. So, this probe basically checks for success in:

  • Acquisition: Data is received by the DAQ system.
  • Storage: Measurements are stored into the database.
  • Retrieval: Measurements are retrieved from the database.
  • Display: Data is displayed in Grafana (almost).

This is nearly to-the-glass monitoring as it probes the very same Grafana API endpoints as the frontend uses for fetching metric data from, just before rendering it to the display.

References

We are currently using this plugin for monitoring freshness of data flows from different sources into InfluxDB:

Kudos to all the people working behind the scenes for providing these great open data resources to the community!

Usage

$ ./check-grafana-datasource-stale.sh --help

Options:
-u, --uri           Grafana API datasource proxy URI
-d, --database      Database name

-t, --table         Table name
-w, --warning       Maximum age threshold of data to result in warning status
-c, --critical      Maximum age threshold of data to result in critical status

-h, --help          Print detailed help
-V, --version       Print version information
-v, --verbose       Turn on verbose output

Example

Sensor invocation:

./check-grafana-datasource-stale.sh \
    --uri https://datahub.example.org/grafana/api/datasources/proxy/42/query \
    --database testdrive \
    --table temperature \
    --warning 12h \
    --critical 3d \
    --verbose

Sensor output:

INFO:  Checking testdrive:temperature for data not older than 3d
INFO:  Checking testdrive:temperature for data not older than 12h
WARNING - Data in testdrive:temperature is stale for 12h or longer

Screenshot

Data acquisition from luftdaten.info triggered a data loss warning

image

when the people operating the platform had to perform some maintenance work on the database

If someone is wondering: The API is down for maintenance. Today we received value no. ‘2^31+1’ . But the database was defined with a maximum of 2^31 values. We are currently changing this to 2^63. But this may need some time.

— OK Lab Stuttgart (@codeforS) March 31, 2018

Install prerequisites

This sensor uses the fine programs HTTPie and jq, please install them on your system.

Debian

apt install httpie jq

# Optionally
pip install httpie

macOS

brew install httpie jq

Setup Icinga plugin

Plugin environment

mkdir -p /usr/local/lib/icinga2/plugins

Edit /etc/icinga2/constants.conf:

const CustomPluginDir = "/opt/monitoring/plugins"

Installation

git clone https://github.com/daq-tools/monitoring-check-grafana /opt/monitoring-check-grafana
ln -s /opt/monitoring-check-grafana/check-grafana-datasource-stale.sh /opt/monitoring/plugins/check-grafana-datasource-stale
ln -s /opt/monitoring-check-grafana/icinga-command-check-grafana.conf /etc/icinga2/conf.d/command-check-grafana.conf

Configuration

A blueprint for a usual configuration object:

object Service "Grafana datasource freshness for testdrive:temperature" {
  import "generic-service"
  check_command         = "check-grafana-datasource-stale"

  host_name             = "datahub.example.org"
  vars.sla              = "24x7"

  vars.grafana_uri      = "https://datahub.example.org/grafana/api/datasources/proxy/42/query"
  vars.grafana_database = "testdrive"
  vars.grafana_table    = "temperature"
  vars.grafana_warning  = "1h"
  vars.grafana_critical = "12h"

  # Optionally assign this service exclusively to these notification recipients only
  #vars.notification.mail.users  = [ "bruce-lee", "chuck-norris" ]
  #vars.notification.mail.groups = [ "null" ]
}

See also icinga-service-check-grafana.example.conf.

Project information

About

The "monitoring-check-grafana" sensor program is released under the GNU AGPL license. Its source code lives on GitHub.

If you'd like to contribute you're most welcome! Spend some time taking a look around, locate a bug, design issue or spelling mistake and then send us a pull request or create an issue.

Thanks in advance for your efforts, we really appreciate any help or feedback.

License

Licensed under the GNU AGPL license. See LICENSE file for details.