storcli.py: Add cachevault status #202

jcpunk · 2024-01-29T17:49:23Z

Adds metrics for the cachevault status.

Hardware tested: LSI MegaRAID SAS-3 3108 [Invader] (rev 02)

Signed-off-by: Pat Riehecky <riehecky@fnal.gov>

dswarbrick · 2024-02-13T18:38:11Z

@SuperQ I need a second opinion here. What does the Big Book of Prometheus Best Practices say about this sort of thing? Should we go with three different metric names, or a single cv_state metric, with the state contained within a label?

jcpunk · 2024-02-15T20:50:38Z

FWIW: https://github.com/prometheus-community/systemd_exporter/tree/main provides:

systemd_unit_state{name="sysinit.target",state="activating",type="target"} 0
systemd_unit_state{name="sysinit.target",state="active",type="target"} 1
systemd_unit_state{name="sysinit.target",state="deactivating",type="target"} 0
systemd_unit_state{name="sysinit.target",state="failed",type="target"} 0
systemd_unit_state{name="sysinit.target",state="inactive",type="target"} 0

dswarbrick · 2024-02-15T21:18:12Z

There are essentially three ways we can go about this. For example, if a CacheVault is degraded, we could expose:

cv_optimal{controller="0",cvidx="1"} 0
cv_degraded{controller="0",cvidx="1"} 1
cv_failed{controller="0",cvidx="1"} 0

or

cv_state{controller="0",cvidx="1",state="optimal"} 0
cv_state{controller="0",cvidx="1",state="degraded"} 1
cv_state{controller="0",cvidx="1",state="failed"} 0

or merely

cv_state{controller="0",cvidx="1",state="degraded"} 1

The first two methods are largely the same, although I would argue that the second method is slightly more user-friendly, as it would allow the contents of the state label to be used verbatim in Grafana dashboards with a very simple query.

The third method will result in stale metrics for 5 minutes whenever the state changes, due to Prometheus' default look-behind window and the fact that a series effectively disappears when the state label changes.

Signed-off-by: Pat Riehecky <riehecky@fnal.gov>

jcpunk · 2024-02-16T14:26:15Z

Updated to try and use example output 2

storcli.py: Add cachevault status

43018b6

Signed-off-by: Pat Riehecky <riehecky@fnal.gov>

dswarbrick self-assigned this Feb 13, 2024

dswarbrick requested a review from SuperQ February 13, 2024 18:38

Shift to more label based output

799b73e

Signed-off-by: Pat Riehecky <riehecky@fnal.gov>

jcpunk force-pushed the storcli-cachevault branch from f0bf28e to 799b73e Compare February 16, 2024 14:26

dswarbrick mentioned this pull request May 8, 2024

Cachevault support for storcli.py collector #120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storcli.py: Add cachevault status #202

storcli.py: Add cachevault status #202

jcpunk commented Jan 29, 2024 •

edited

dswarbrick commented Feb 13, 2024

jcpunk commented Feb 15, 2024

dswarbrick commented Feb 15, 2024

jcpunk commented Feb 16, 2024

storcli.py: Add cachevault status #202

Are you sure you want to change the base?

storcli.py: Add cachevault status #202

Conversation

jcpunk commented Jan 29, 2024 • edited

dswarbrick commented Feb 13, 2024

jcpunk commented Feb 15, 2024

dswarbrick commented Feb 15, 2024

jcpunk commented Feb 16, 2024

jcpunk commented Jan 29, 2024 •

edited