New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to collect raid controller device S.M.A.R.T data #89
Comments
Yes, this is the real reason why you need such a service in the first place - to monitor devices that are not easily visible inside the operating system. |
If the device type was able to be retrieved and passed into function readSMARTctl then this could be used with the --device flag and would be a safer way of being able to scan all device types. EG as below :
|
@marpears I can read the device info with smartctl including the device option, but NOT with smartctl_exporter...
but this doesn't work:
|
@josefzahner The |
how does one configure cciss,1? I need to do it on some of my nodes and have not found a way yet. |
This is a gating factor for me too. I've added comments to the above issue and linked PR. |
This is also an issue for me. I guess a proper solution would involve adding a separate flag to provide extra flags for |
The tool should discover such HBAs and do so automagically at per-device granularity, since there can and will be a mixed population of direct-attach, passthrough, and hidden-by-VD drives on various sytems and especially within a given system.
|
@jakubgs It's more than just extra flags, it's discovery too.
|
Any way to do this yet? |
I’d do it myself if I had the coding skills. It really is a fatal flaw. Mind you HBA RAID is itself a fatal flaw but Dell’s BOSS-N1 is too useful, though one has to invoke ‘mvcli’ to get status. On Jul 14, 2023, at 5:10 PM, kfox1111 ***@***.***> wrote:
Any way to do this yet?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
I did a bit of research into this and found out that these devices can be found with
But there might be an even better way to identify those devices, and that is
As we can see the And different host without HBA:
I don't know what maintainers would think about using a tool other than I'm going to read a bit the code to see how difficult this would be. |
Main issue as far as I can tell is that even if you discover the devices, often you won't get much info from them: {
"json_format_version": [1, 0],
"smartctl": {
"version": [7, 2],
"svn_revision": "5155",
"platform_info": "x86_64-linux-5.15.0-79-generic",
"build_info": "(local build)",
"argv": ["smartctl", "-A", "--device", "cciss,1", "/dev/sdb", "--json"],
"exit_status": 0
},
"device": {
"name": "/dev/sdb",
"info_name": "/dev/sdb [cciss_disk_01] [SCSI]",
"type": "cciss",
"protocol": "SCSI"
},
"temperature": {
"current": 21,
"drive_trip": 70
},
"power_on_time": {
"hours": 47138,
"minutes": 5
},
"scsi_grown_defect_list": 0
} Temperature and power-on time... not great. |
Better than nothing, but yeah. I haven't had an HP HBA to work with for years, but re the |
I'm increasingly leaning toward having a protege write a SMART harvester from scratch in Python, which would make it easier to normalize the vagaries of data that |
Personally I'd rather fix what we have working than try from scratch. I'm busy enough dealing with what I have working already have the time to reinvent wheels. Even if I get just temp and power-on hours that's better than deployed SMART exporter just failing at startup and Prometheus returning alerts for the downed service. |
But your point about SATA/SAS is well made. I will have to check how that is done on my servers. |
I tried to collect data from a server with a raid controller through smartctl exporter.
However, an error occurred as below.
How can i collect S.M.A.R.T data on raid controller devices?
The text was updated successfully, but these errors were encountered: