Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape timeout with many disks #197

Open
pokab opened this issue Jan 20, 2024 · 2 comments
Open

Scrape timeout with many disks #197

pokab opened this issue Jan 20, 2024 · 2 comments

Comments

@pokab
Copy link

pokab commented Jan 20, 2024

I manage a Linux box with around 20-25 HDDs. Some of these disks are faster to reply to smartctl, others are pretty slow, taking around 1-2 seconds.
Setting the scrape interval to 60 seconds and scrape timeout to 40 seconds does not help in avoiding regular scrape timeouts.
A previous solution (not specific to Prometheus) I made spawned the smartctl subprocesses in parallel for all the HDDs, and it works perfectly. Would a solution like this be appropriate for this software? Maybe with an option to enable or disable it?

@SuperQ
Copy link
Contributor

SuperQ commented Jan 22, 2024

This should be possible here with a goroutine worker pool. We can parallellize the data collection.

It doesn't look like the current collector records any timing information, even in debug mode. Something we can improve as well.

SuperQ added a commit that referenced this issue Jan 22, 2024
Update the smartctl command reading and parsing of json logging to make
for easier debugging of slow devices by adding a duration to the debug
logging.

For #197

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit that referenced this issue Jan 24, 2024
Update the smartctl command reading and parsing of json logging to make
for easier debugging of slow devices by adding a duration to the debug
logging.

For #197

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit that referenced this issue Jan 25, 2024
Update the smartctl command reading and parsing of json logging to make
for easier debugging of slow devices by adding a duration to the debug
logging.

For #197

Signed-off-by: SuperQ <superq@gmail.com>
pokab added a commit to idatahu/smartctl_exporter that referenced this issue Feb 27, 2024
pokab added a commit to idatahu/smartctl_exporter that referenced this issue Feb 27, 2024
zxzharmlesszxz pushed a commit to zxzharmlesszxz/smartctl_exporter that referenced this issue Mar 5, 2024
Update the smartctl command reading and parsing of json logging to make
for easier debugging of slow devices by adding a duration to the debug
logging.

For prometheus-community#197

Signed-off-by: SuperQ <superq@gmail.com>
Signed-off-by: mort <mort@syneforge.com>
zxzharmlesszxz pushed a commit to zxzharmlesszxz/smartctl_exporter that referenced this issue Mar 5, 2024
Update the smartctl command reading and parsing of json logging to make
for easier debugging of slow devices by adding a duration to the debug
logging.

For prometheus-community#197

Signed-off-by: SuperQ <superq@gmail.com>
Signed-off-by: Denys <zxzharmlesszxz@gmail.com>
@pokab
Copy link
Author

pokab commented Mar 19, 2024

I've been using my forked version (see #204) for three weeks without any problem. It successfully solved the scrape issue. Could you please give some feedback?

pokab added a commit to idatahu/smartctl_exporter that referenced this issue May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants