Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

md_info.sh / md_info_detail.sh: /dev/md/ directory does not exist on newer kernels #92

Open
candlerb opened this issue Mar 31, 2021 · 3 comments · May be fixed by #109
Open

md_info.sh / md_info_detail.sh: /dev/md/ directory does not exist on newer kernels #92

candlerb opened this issue Mar 31, 2021 · 3 comments · May be fixed by #109

Comments

@candlerb
Copy link
Contributor

candlerb commented Mar 31, 2021

On some newer machines, I find that /dev/mdXXX exists but the /dev/md directory does not. This includes Ubuntu 20.04 and Debian 10, both with 5.4.0 kernel.

# ls -l /dev/md*
brw-rw---- 1 root disk 9, 127 Mar 31 08:46 /dev/md127
# 

Unfortunately the textfile collector scripts are hard-wired to look for /dev/md/* and so they don't pick up any arrays.

The array is visible under /dev/disk/by-id/md-name-<HOSTNAME>:<ARRAYNAME>

# find /dev -lname '*md*'
/dev/log
/dev/disk/by-uuid/f36588df-7e7f-446a-9be4-c0c6d092dcf4
/dev/disk/by-id/md-name-CORE-ELASTIC-VM1:127
/dev/disk/by-id/md-uuid-f6641f8b:65b425e9:298e0ac2:b6c3c83e
/dev/block/9:127
# find /dev -lname '*md*' | xargs ls -l
lrwxrwxrwx 1 root root  8 Mar 31 08:46 /dev/block/9:127 -> ../md127
lrwxrwxrwx 1 root root 11 Mar 31 08:46 /dev/disk/by-id/md-name-CORE-ELASTIC-VM1:127 -> ../../md127
lrwxrwxrwx 1 root root 11 Mar 31 08:46 /dev/disk/by-id/md-uuid-f6641f8b:65b425e9:298e0ac2:b6c3c83e -> ../../md127
lrwxrwxrwx 1 root root 11 Mar 31 08:46 /dev/disk/by-uuid/f36588df-7e7f-446a-9be4-c0c6d092dcf4 -> ../../md127
lrwxrwxrwx 1 root root 28 Mar 31 08:46 /dev/log -> /run/systemd/journal/dev-log

This path also exists for Ubuntu 18.04 (4.15.0). Checking the oldest machine I have, which is CentOS 6 (2.6.32):

# find /dev -lname '*md*'
/dev/md/scratch1_0
/dev/disk/by-uuid/a6444e5e-6ee7-49bd-8973-970756366b30
/dev/disk/by-id/md-uuid-962cbdcc:b9482b4a:c9971b8d:e2b43c68
/dev/disk/by-id/md-name-scratch1
/dev/block/9:127
/dev/.udev/watch/61
/dev/.udev/links/disk\x2fby-uuid\x2fa6444e5e-6ee7-49bd-8973-970756366b30/b9:127
/dev/.udev/links/md\x2fscratch1_0/b9:127
/dev/.udev/links/disk\x2fby-id\x2fmd-uuid-962cbdcc:b9482b4a:c9971b8d:e2b43c68/b9:127
/dev/.udev/links/disk\x2fby-id\x2fmd-name-scratch1/b9:127

In this host, mdadm -E shows the name as just

           Name : scratch1

although it's currently picked up as md_name="scratch1_0", and so changing to /dev/disk/by-id/md-name-* would change the label set.

However, I think this is probably the right long-term fix. To demonstrate:

--- a/roles/prometheus_node_exporter/files/node-exporter-textfile-collector-scripts/md_info_detail.sh
+++ b/roles/prometheus_node_exporter/files/node-exporter-textfile-collector-scripts/md_info_detail.sh
@@ -6,7 +6,7 @@

 set -eu

-for MD_DEVICE in /dev/md/*; do
+for MD_DEVICE in /dev/disk/by-id/md-name-*; do
   if [ -b "$MD_DEVICE" ]; then
   # Subshell to avoid eval'd variables from leaking between iterations
   (
@@ -15,7 +15,7 @@ for MD_DEVICE in /dev/md/*; do

     # Remove /dev/ prefix
     MD_DEVICE_NUM=${MD_DEVICE_NUM#/dev/}
-    MD_DEVICE=${MD_DEVICE#/dev/md/}
+    MD_DEVICE=${MD_DEVICE#/dev/disk/by-id/md-name-}

     # Query sysfs for info about md device
     SYSFS_BASE="/sys/devices/virtual/block/${MD_DEVICE_NUM}/md"
@dswarbrick
Copy link
Member

The /dev/disk/by-{id,partlabel,partuuid,path,uuid} directories are created by udev, and thus probably also somewhat variable from one distro to the next (and also not guaranteed to be present at all).

The only entries that are always going to be there are /dev/md[0-9]* and /sys/block/md[0-9]*, as these are created by the md kernel modules, and not dependent on any userspace daemon.

@candlerb
Copy link
Contributor Author

In that case, is it better not to attempt to get the array names at all?

--- md_info_detail.sh.orig	2021-03-31 10:16:25.593118296 +0100
+++ md_info_detail.sh.new	2021-03-31 10:41:01.406242215 +0100
@@ -6,16 +6,12 @@

 set -eu

-for MD_DEVICE in /dev/md/*; do
-  if [ -b "$MD_DEVICE" ]; then
+for MD_DEVICE_NUM in /dev/md?*; do
+  if [ -b "$MD_DEVICE_NUM" ]; then
   # Subshell to avoid eval'd variables from leaking between iterations
   (
-    # Resolve symlink to discover device, e.g. /dev/md127
-    MD_DEVICE_NUM=$(readlink -f "${MD_DEVICE}")
-
     # Remove /dev/ prefix
     MD_DEVICE_NUM=${MD_DEVICE_NUM#/dev/}
-    MD_DEVICE=${MD_DEVICE#/dev/md/}

     # Query sysfs for info about md device
     SYSFS_BASE="/sys/devices/virtual/block/${MD_DEVICE_NUM}/md"
@@ -64,13 +60,13 @@
       if echo "$line" | grep -E -q "Devices :|Array Size :| Used Dev Size :|Events :"; then
         MDADM_DETAIL_KEY=$(echo "$line" | cut -d ":" -f 1 | tr -cd '[a-zA-Z0-9]._-')
         MDADM_DETAIL_VALUE=$(echo "$line" | cut -d ":" -f 2 | cut -d " " -f 2 | sed 's:^ ::')
-        echo "node_md_info_${MDADM_DETAIL_KEY}{md_device=\"${MD_DEVICE_NUM}\", md_name=\"${MD_DEVICE}\", raid_level=\"${MD_LEVEL}\", md_num_raid_disks=\"${MD_NUM_RAID_DISKS}\", md_metadata_version=\"${MD_METADATA_VERSION}\"} ${MDADM_DETAIL_VALUE}"
+        echo "node_md_info_${MDADM_DETAIL_KEY}{md_device=\"${MD_DEVICE_NUM}\", raid_level=\"${MD_LEVEL}\", md_num_raid_disks=\"${MD_NUM_RAID_DISKS}\", md_metadata_version=\"${MD_METADATA_VERSION}\"} ${MDADM_DETAIL_VALUE}"
       fi
     done  <<< "$MDADM_DETAIL_OUTPUT"

     # Output RAID detail metrics info from the output of "mdadm --detail"
     # NOTE: Sending this info as labels rather than separate metrics, because some of them can be strings.
-    echo -n "node_md_info{md_device=\"${MD_DEVICE_NUM}\", md_name=\"${MD_DEVICE}\", raid_level=\"${MD_LEVEL}\", md_num_raid_disks=\"${MD_NUM_RAID_DISKS}\", md_metadata_version=\"${MD_METADATA_VERSION}\""
+    echo -n "node_md_info{md_device=\"${MD_DEVICE_NUM}\", raid_level=\"${MD_LEVEL}\", md_num_raid_disks=\"${MD_NUM_RAID_DISKS}\", md_metadata_version=\"${MD_METADATA_VERSION}\""
     while IFS= read -r line ; do
       # Filter for lines with a ":", to use for Key/Value pairs in labels
       if echo "$line" | grep -E -q ":" ; then

Or else the code could work multiple ways (one where /dev/md exists, one where it doesn't)

@dswarbrick
Copy link
Member

dswarbrick commented Mar 31, 2021

I think that attempting to obtain the array name is ok, so long as it fails gracefully. But obviously the glob pattern at the start of the for-loop should use a more robust / reliable value than /dev/md/*.

Also I don't think that the lack of a /dev/md directory has got anything to do with kernel versions, as that was always a userspace function of the mdadm tool. The man page describes the various scenarios under which it will create the symlinks in the /dev/md directory. In some ways, this is duplicating the functionality of udev, but IIRC, mdadm predates udev.

Kriechi added a commit to Kriechi/node-exporter-textfile-collector-scripts that referenced this issue Dec 27, 2021
Kriechi added a commit to Kriechi/node-exporter-textfile-collector-scripts that referenced this issue Dec 27, 2021
closes prometheus-community#24
closes prometheus-community#48
closes prometheus-community#92
For remaining problems, please open a new issue at https://github.com/prometheus/node_exporter

Signed-off-by: Thomas Kriechbaumer <thomas@kriechbaumer.name>
Kriechi added a commit to Kriechi/node-exporter-textfile-collector-scripts that referenced this issue Dec 27, 2021
closes prometheus-community#24
closes prometheus-community#25
closes prometheus-community#48
closes prometheus-community#92
For remaining problems, please open a new issue at https://github.com/prometheus/node_exporter

Signed-off-by: Thomas Kriechbaumer <thomas@kriechbaumer.name>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants