Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostmetricsreceiver: use gopsutil cached boot time #32126

Merged
merged 5 commits into from
Apr 11, 2024

Conversation

braydonk
Copy link
Contributor

@braydonk braydonk commented Apr 2, 2024

Description:
This PR enables gopsutil's cached boot time feature. The primary purpose is to reduce the CPU usage of the process and processes scrapers, which read the boot time vastly more times than necessary.

Also added the hostmetrics.process.bootTimeCache featuregate which is enabled by default. Disabling it will return the process scraper to a similar functionality of reading the boot time at the start of every scrape (but still won't read it as much as it used to).

Link to tracking Issue:
#28849

Testing:
No tests were added. Ran the collector to ensure that everything still functioned as expected with the new functionality.

Documentation:
TODO

This PR enables `gopsutil`'s cached boot time feature. The primary
purpose is to reduce the CPU usage of the `process` and `processes`
scrapers, which read the boot time vastly more times than necessary.

Also added the `hostmetrics.process.bootTimeCache` featuregate which is
enabled by default. Disabling it will return the `process` scraper to a
similar functionality of reading the boot time at the start of every
scrape (but still won't read it as much as it used to).
@braydonk
Copy link
Contributor Author

braydonk commented Apr 2, 2024

Working on changelog and documentation, wanted to get it started running tests early.

Comment on lines +113 to +117
host.EnableBootTimeCache(false)
_, err := host.BootTimeWithContext(ctx)
if err != nil {
errs.AddPartial(1, fmt.Errorf(`retrieving boot time failed with error "%w", using cached boot time`, err))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm till not sure why we need it. I'd prefer to keep the Alpha state as it is right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean keeping the alpha state of the featuregate as the old behaviour of not caching anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I don't want to do that is because I want to make it so that with the featuregate on or off, the performance is still down to a more normal level. The code here ends up with what is essentially the same setup as before, but reading the boot time once as opposed to (number of processes) * 4. This has basically the same performance benefit as caching the boot time for the whole run of the process, but maintains functionally the same old behaviour. I mentioned in the comment on the issue that I was going to have the caching turned on by default, and when the featuregate was disabled it would do this. Should I invert it so the featuregate is alpha and we use this behaviour instead by default?

Copy link
Member

@dmitryax dmitryax Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that one feature gate controls two new behaviors even if one of them doesn't affect the user experience. But I don't think it's worth debating too much. So we can merge it in this state

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave a note here and in docs what is needed as a user by cfg or as a developer by "ocb build" to achieve the minimum cpu load by max caching, because i already feel a bit lost how will be able to make use of it based on this PR

Copy link
Contributor Author

@braydonk braydonk Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cforce Nothing will be needed in config or build to get the maximum caching. Caching the boot time for the entire lifetime of the collector will be the new default.

Copy link

@cforce cforce Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to build prerelase of 0.99 with ocb , which XXXX version/tag/branch name shall i use?

  • gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver XXXX

is it ?

  • gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver@main
    version: latest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I need to get a commit-specific version, I go to the component I care about and get the latest commit hash, then run go list -m <module>@<commit> which gives me the module pseudo-version.

$ go list -m github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver@fcf1bee
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.98.1-0.20240416201607-fcf1bee48590

So based on that in ocb you'd do:

- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.98.1-0.20240416201607-fcf1bee48590

I've never tried mixing versions between components in an ocb config; you might need to use that version for everything if I'm not mistaken.

@dmitryax dmitryax merged commit c268b28 into open-telemetry:main Apr 11, 2024
132 of 142 checks passed
@github-actions github-actions bot added this to the next release milestone Apr 11, 2024
@cforce
Copy link

cforce commented Apr 26, 2024

now released in otecol 0.99.0 👍 - excellent

rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this pull request May 8, 2024
)

**Description:**
This PR enables `gopsutil`'s cached boot time feature. The primary
purpose is to reduce the CPU usage of the `process` and `processes`
scrapers, which read the boot time vastly more times than necessary.

Also added the `hostmetrics.process.bootTimeCache` featuregate which is
enabled by default. Disabling it will return the `process` scraper to a
similar functionality of reading the boot time at the start of every
scrape (but still won't read it as much as it used to).

**Link to tracking Issue:** open-telemetry#28849

**Testing:** 
No tests were added. Ran the collector to ensure that everything still
functioned as expected with the new functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants