Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce cpu consumption of hostmetric receiver (process & processes) #8789

Closed
cforce opened this issue Nov 1, 2023 · 1 comment
Closed

reduce cpu consumption of hostmetric receiver (process & processes) #8789

cforce opened this issue Nov 1, 2023 · 1 comment

Comments

@cforce
Copy link
Contributor

cforce commented Nov 1, 2023

Is your feature request related to a problem? Please describe.

Hostmetrics process(es) metrics are retrieved using the go lib gopstutil. There are several known issues with CPU utilization peaks mainly connected to getting the "boot time each time" per process. See

A bit off-topic but relevant for hostmetrics "proc/net" network connection metrics is this one. The system.network.connections metric being disabled and not collecting the information from the host does waste less CPU cycles, especially through the flame graph query. It is found that gopsutil is used. This will scan files under the proc directory. The more links, the more CPU. The more resources are occupied, see if it can be adjusted. However, this needs otel collector >= v0.85.0 at least, as before there was a bug that disabling did still execute the scraping. See receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper.go (line 160, extra "if !s.config.Metrics.SystemNetworkConnections.Enabled {"). If you try around with load/perf on a Docker container env, be aware that you need to mount the host filesystem to simulate a proper /proc access.

Describe the solution you'd like
I would like to get improvements and fixes related to the high CPU utilization issues caused by retrieving process metrics using gopstutil. Specifically, addressing the known issues mentioned in gopstutil Issue #1283, gopstutil Issue #842, and gopstutil Issue #1070.

Additionally, I suggest addressing the matter of the system.network.connections metric and its impact on CPU cycles, as described in gopstutil Issue #695.

Describe alternatives you've considered
Alternative solutions could involve optimizing the way process metrics are collected and exploring methods to reduce CPU utilization by less /not at all getting "boot time each time per process " or for metrics like system.network.connections. These alternatives may require changes to the gopstutil library or adjustments in the Otell Collector code.

Additional context
I have performed an analysis on CPU spikes for the OTEL collector by modifying the configuration and running a cputop script for approximately 5 minutes. The host metrics scrapers list used in OTEL collector includes CPU, memory, network, disk, load, paging, processes, and process metrics.

Based on the data collected, it is evident that certain metrics significantly impact CPU spikes while others do not. For example:

  • Process and Processes metrics create an impact on CPU spike.
  • CPU, memory, and paging metrics do not seem to have a significant effect on CPU spike.

This information should guide our approach to optimizing the collector's performance and reducing CPU utilization in specific areas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant