Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics/scrapers/process] - "error reading username for process x... unknown userid y" when user not found in /etc/passwd #17187

Closed
alexchowle opened this issue Dec 21, 2022 · 11 comments

Comments

@alexchowle
Copy link
Contributor

alexchowle commented Dec 21, 2022

Component(s)

receiver/hostmetrics

What happened?

Description

Attempting to collect per-process metrics fails as no Username can be found for a given PID's UID.

Steps to Reproduce

Enabling the collection of per-process metrics via the following config (and running as root):

receivers:
  hostmetrics:
    scrapers:
      process:
        mute_process_name_error: true

Expected Result

Production of per-process metrics

Actual Result

No per-process metrics are created.

Collector version

v0.60.0

Environment information

Environment

OS: CentOS Linux release 7.9.2009 (Core)

OpenTelemetry Collector configuration

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  memory_ballast:
    size_mib: 683
  zpages:
    endpoint: ":55679"

receivers:
  hostmetrics:
    collection_interval: 5s
    scrapers:
      cpu:
      disk:
      filesystem:
      memory:
      network:
      load:
      paging:
      processes:
      process:
        mute_process_name_error: true
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  
processors:
  batch:
  memory_limiter:
    check_interval: 2s
    limit_mib: 1024

  resourcedetection:
    detectors: [gce, ecs, ec2, azure, system]
    override: true
              
  resource/add_environment:
    attributes:
      - action: insert
        value: LOCAL
        key: deployment.environment

exporters:
  # Traces
  otlp:
    endpoint: <SOME_OTHER_SERVER>:4317
    tls:
      insecure: true
  logging:
    loglevel: info

service:
  extensions: [memory_ballast, health_check, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resourcedetection, resource/add_environment]
      exporters: [otlp, logging]
    metrics:
      receivers: [hostmetrics, otlp]
      processors: [memory_limiter, batch, resourcedetection, resource/add_environment]
      exporters: [otlp, logging]

Log output

2022-12-20T15:23:28.574Z        info    service/service.go:129  Everything is ready. Begin running and processing data.                                                                                            
2022-12-20T15:23:32.894Z        error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0; error reading username for process \"nemo\" (pid 963): user: unknown userid 247235611; error reading username for process \"gnome-keyring-daemon\" (pid 4108): user: unknown userid 247235611; error reading username for process \"cinnamon-session\" (pid 4179): user: unknown userid 247235611; error reading username for process \"dbus-launch\" (pid 4190): user: unknown userid 247235611;
...
...

Additional context

Using an ActiveDirectory-based authentication system via sssd which means that a local user account may not exist for a logged-in user. Running the Collector as root i.e. not the UID that is failing lookup

@alexchowle alexchowle added bug Something isn't working needs triage New item requiring triage labels Dec 21, 2022
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@alexchowle
Copy link
Contributor Author

alexchowle commented Dec 21, 2022

After looking through the code I suspect that the error is being returned by Go stdlib os/user package's reliance on users existing in etc/passwd: https://cs.opensource.google/go/go/+/refs/tags/go1.19.4:src/os/user/lookup_unix.go;l=229

Should a process scrape fail because we can't get a valid Username from a UID?

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions
Copy link
Contributor

Pinging code owners for receiver/hostmetrics: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot removed the Stale label Mar 5, 2023
@atoulme atoulme removed the needs triage New item requiring triage label Mar 10, 2023
@PatrikSteuer
Copy link

We face the same issue. We would prefer having a metrics, even if the username can't be resolved.

@atoulme
Copy link
Contributor

atoulme commented Mar 18, 2023

What would you like instead? Just the UID? An empty string?

@kbielefe
Copy link

Just the UID would be better than nothing. Ideally, the username gets looked up in sssd.

@alexchowle
Copy link
Contributor Author

Yes - that works for me

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 19, 2023
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 18, 2023
@KokoTa
Copy link

KokoTa commented Aug 7, 2023

So how to resolve this? I just want to get metrics from different machine:

2023-08-07T00:31:28.239-0700	error	scraperhelper/scrapercontroller.go:214	Error scraping metrics	{"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
	go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:214

dmitryax pushed a commit that referenced this issue Oct 30, 2023
…mute `error reading username for process` (#28661)

**Description:**

add configuration option `mute_process_user_error`) to mute `error
reading username for process`

**Link to tracking Issue:**

* #14311
* #17187

Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…mute `error reading username for process` (open-telemetry#28661)

**Description:**

add configuration option `mute_process_user_error`) to mute `error
reading username for process`

**Link to tracking Issue:**

* open-telemetry#14311
* open-telemetry#17187

Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this issue Nov 24, 2023
…mute `error reading username for process` (open-telemetry#28661)

**Description:**

add configuration option `mute_process_user_error`) to mute `error
reading username for process`

**Link to tracking Issue:**

* open-telemetry#14311
* open-telemetry#17187

Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants