Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] Scraping process metrics gives parent PID read error on Linux #14681

Closed
BinaryFissionGames opened this issue Oct 3, 2022 · 9 comments

Comments

@BinaryFissionGames
Copy link
Contributor

What happened?

Description

When scraping process metrics on linux, an error indicating that an invalid PID (0) is being read.
This fills the logs on each scrape with error logs, that may drown out other important logs.

Steps to Reproduce

Scrape process metrics as root

Expected Result

No error is logged

Actual Result

Error message about error reading parent PID for systemd (PID = 0)

Collector version

v0.61.0

Environment information

OS: Debian 11 (bullseye)

OpenTelemetry Collector configuration

receivers:
  hostmetrics/host__source0:
    collection_interval: 60s
    scrapers:
      filesystem:
      load:
      memory:
      network:
      paging:
      process:
        mute_process_name_error: true

exporters:
  logging:

service:
  pipelines:
    metrics:
      receivers:
        - hostmetrics
      exporters:
        - logging

Log output

{"level":"error","ts":"2022-10-03T17:44:51.239Z","caller":"scraperhelper/scrapercontroller.go:197","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics/host__source0","pipeline":"metrics","error":"error reading parent pid for process \"systemd\" (pid 1): invalid pid 0","scraper":"process","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector@v0.61.0/receiver/scraperhelper/scrapercontroller.go:197\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector@v0.61.0/receiver/scraperhelper/scrapercontroller.go:172"}

Additional context

No response

@BinaryFissionGames BinaryFissionGames added bug Something isn't working needs triage New item requiring triage labels Oct 3, 2022
@BinaryFissionGames BinaryFissionGames changed the title [receiver/hostmetrics] Linux [receiver/hostmetrics] Scraping process metrics gives parent PID read error Oct 3, 2022
@BinaryFissionGames BinaryFissionGames changed the title [receiver/hostmetrics] Scraping process metrics gives parent PID read error [receiver/hostmetrics] Scraping process metrics gives parent PID read error on Linux Oct 3, 2022
@evan-bradley evan-bradley added priority:p2 Medium receiver/hostmetrics and removed needs triage New item requiring triage labels Oct 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2022

Pinging code owners: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 5, 2022
@alexchowle
Copy link
Contributor

This is happening in 0.60.0, too. Nothing special with the config and is just the collector running as native process. I get lots of "error reading process name for pid #: readlink /proc/#/exe: no such file or directory" (where # is a given low numbered PID).

@alexchowle
Copy link
Contributor

Now, having added "mute_process_name_error: true" to the config I have "... unknown userid xxxxxx" messages instead, where xxxxxx is a number

@alexchowle
Copy link
Contributor

Heh. I've traced this all the way through the dependency tree. The underlying "os/user" package is failing a UID lookup by returning an error if the user does not exist in "/etc/passwd" file.

Should all process scrapes fail because a UID can't be resolved?

@alexchowle
Copy link
Contributor

Raised as separate issue #17187

@github-actions github-actions bot removed the Stale label May 26, 2023
@prashant-shahi
Copy link
Contributor

This issue seems to exist in v0.79.x as well.

@dmitryax I would really appreciate any updates on this issue.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 18, 2023
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants