You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when we execute the nats server check command, the client expects a set of threshold flags as inputs to be able to answer if the server is healthy or not according to the provided thresholds.
Feature request
It would be useful to know get current metric values, instead of just knowing if the threshold was exceeded or not.
This would enable us to create a prometheus exporter component that could export prometheus metrics on all of the nats server check set of commands. We would use these metrics on alerts and define the needed thresholds on the alerts themselves.
Desired behaviour
Optional threshold flags, because those thresholds could be set on the alerts.
New metrics on prometheus format to show the current health state.
# HELP nats_server_check_stream_peer_lagged RAFT peers that are lagged more than configured threshold
# TYPE nats_server_check_stream_peer_lagged gauge
nats_server_check_stream_peer_lagged{item="TEST"} 0
...
Proposed
nats server check stream --server nats://nats:4222 \
--stream TEST \
--format prometheus
In the previous example it would export a metric saying if a given peer has lag according to the provided threshold flag --peer-lag-critical 100.
In this example, it would just export the peer lag itself for each peer <> stream.
This strategy could be applied for every other type of metric currently available on the tool.
Example output:
# HELP nats_server_check_stream_peer_lag RAFT peer lag
# TYPE nats_server_check_stream_peer_lag gauge
nats_server_check_stream_peer_lag{item="TEST", peer="nats-2"} 200
...
Thanks 🙏
The text was updated successfully, but these errors were encountered:
Current behaviour
Currently when we execute the
nats server check
command, the client expects a set of threshold flags as inputs to be able to answer if the server is healthy or not according to the provided thresholds.Feature request
It would be useful to know get current metric values, instead of just knowing if the threshold was exceeded or not.
This would enable us to create a prometheus exporter component that could export prometheus metrics on all of the
nats server check
set of commands. We would use these metrics on alerts and define the needed thresholds on the alerts themselves.Desired behaviour
Example
Currently
nats server check stream --server nats://nats:4222 \ --stream TEST \ --peer-expect 1 \ --lag-critical 100 \ --msgs-warn 4000 \ --msgs-critical 3000 \ --min-sources 33 \ --max-sources 34 \ --peer-lag-critical 100 \ --peer-seen-critical 5m \ --format prometheus
Example output:
Proposed
In the previous example it would export a metric saying if a given peer has lag according to the provided threshold flag
--peer-lag-critical 100
.In this example, it would just export the peer lag itself for each peer <> stream.
This strategy could be applied for every other type of metric currently available on the tool.
Example output:
Thanks 🙏
The text was updated successfully, but these errors were encountered: