Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcli.py: Add cachevault status #202

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jcpunk
Copy link
Contributor

@jcpunk jcpunk commented Jan 29, 2024

Adds metrics for the cachevault status.

Hardware tested: LSI MegaRAID SAS-3 3108 [Invader] (rev 02)

Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
@dswarbrick
Copy link
Member

@SuperQ I need a second opinion here. What does the Big Book of Prometheus Best Practices say about this sort of thing? Should we go with three different metric names, or a single cv_state metric, with the state contained within a label?

@dswarbrick dswarbrick self-assigned this Feb 13, 2024
@jcpunk
Copy link
Contributor Author

jcpunk commented Feb 15, 2024

FWIW: https://github.com/prometheus-community/systemd_exporter/tree/main provides:

systemd_unit_state{name="sysinit.target",state="activating",type="target"} 0
systemd_unit_state{name="sysinit.target",state="active",type="target"} 1
systemd_unit_state{name="sysinit.target",state="deactivating",type="target"} 0
systemd_unit_state{name="sysinit.target",state="failed",type="target"} 0
systemd_unit_state{name="sysinit.target",state="inactive",type="target"} 0

@dswarbrick
Copy link
Member

There are essentially three ways we can go about this. For example, if a CacheVault is degraded, we could expose:

cv_optimal{controller="0",cvidx="1"} 0
cv_degraded{controller="0",cvidx="1"} 1
cv_failed{controller="0",cvidx="1"} 0

or

cv_state{controller="0",cvidx="1",state="optimal"} 0
cv_state{controller="0",cvidx="1",state="degraded"} 1
cv_state{controller="0",cvidx="1",state="failed"} 0

or merely

cv_state{controller="0",cvidx="1",state="degraded"} 1

The first two methods are largely the same, although I would argue that the second method is slightly more user-friendly, as it would allow the contents of the state label to be used verbatim in Grafana dashboards with a very simple query.

The third method will result in stale metrics for 5 minutes whenever the state changes, due to Prometheus' default look-behind window and the fact that a series effectively disappears when the state label changes.

Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
@jcpunk
Copy link
Contributor Author

jcpunk commented Feb 16, 2024

Updated to try and use example output 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants