Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 0.25.1 occasional startup issues #1398

Open
JDA88 opened this issue Jan 29, 2024 · 5 comments
Open

Version 0.25.1 occasional startup issues #1398

JDA88 opened this issue Jan 29, 2024 · 5 comments
Labels

Comments

@JDA88
Copy link
Contributor

JDA88 commented Jan 29, 2024

Sometime when the windows_exporter start we get this error (event log):
ts=2024-01-28T03:43:29.147Z caller=stdlib.go:105 level=error caller=http.go:144 msg="error gathering metrics: error collecting metric Desc{fqName: \"windows_exporter_collector_success\", help: \"windows_exporter: Whether the collector was successful.\", constLabels: {}, variableLabels: {collector}}: failed to prepare scrape: EOF"

The issue is that once the error has happened at startup, the service is up but is unable to recover; The service should either:

  1. Crash and exit (and then can be restarted)
  2. Be able to retry and recover

Additional informations

  • Restarting the service fixes the issue
  • It can append with various collector combination, and even with simple ones (CS, CPU)
  • Very hard to reproduce as it only appends randomly (on our 700+ server deployement we have 5-10 issues a week).
  • Hard to tell when it started occurring, but it was not present in v0.22
@safster123
Copy link

Just wanted to note that I'm also seeing similar.

We also have a large number of servers and for the most part it's fine but a handful of servers will show this error.

Restarting fixes it.

@DiniFarb
Copy link
Contributor

DiniFarb commented Apr 4, 2024

When I follow the error line to see where the EOF could have happend, I find myself in the perflib query func

func QueryPerformanceData(query string) ([]*PerfObject, error) {

This functions calls more than once the following binary reader:

func (p *perfObjectType) BinaryReadFrom(r io.Reader) error {
return binary.Read(r, bo, p)
}

this reader could return EOF if the given buffer is empty. So one thing what could happen is that for example the queryraw func returns an empty buffer here:
buffer, err := queryRawData(query)
if err != nil {
return nil, err
}
r := bytes.NewReader(buffer)
// Read global header
header := new(perfDataBlock)
err = header.BinaryReadFrom(r)
if err != nil {
return nil, err
}

which would then return the EOF on line 283 and end in the mentioned error: failed to prepare scrape: EOF"

Since it is not really reproducible and very hard to guess on what perflib call this could happen, my suggestion would be to add the query string to the error message in order to have more visibility and maybe a starting point for further debugging.

Maybe something like this:

if err != nil {
  return nil, fmt.Errorf("failed to read performance data block for %s with: %v", query, err)
}

@breed808 or @jkroepke do you think that would be worth adding? I am open to create PR :)

@billtzim
Copy link

Just to mention that i also stumbled upon this issue. Occurrence ratio ~3% (1 out of 33hosts). Although restarting the service did not fix the issue for me

@jkroepke jkroepke added the bug label Apr 25, 2024
@jkroepke
Copy link
Member

Are you able to test your fix?

@DiniFarb
Copy link
Contributor

DiniFarb commented May 2, 2024

@jkroepke did you address me? Cos I don't have a solution/fix for this, my approach would only add more visibility which could help to find the cause of this problem. But I just saw the PR #1459 and I think the effort should go more in this direction. Cos if I understand it correctly by success of this PR most of the perflib calls would go away anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants