Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroups v2 container.id discovery #523

Open
graphaelli opened this issue Oct 13, 2021 · 7 comments
Open

cgroups v2 container.id discovery #523

graphaelli opened this issue Oct 13, 2021 · 7 comments
Assignees

Comments

@graphaelli
Copy link
Member

Is your feature request related to a problem? Please describe.
cgroups v2 is increasingly seeing adoption as various distributions have made it the default for containers. As noted in elastic/beats#16958, Fedora 31 (late 2019) enables it by default, Ubuntu 21.10 does as well.

The current spec covers only cgroups v1, this issue is a feature request for v2 support.

Describe the solution you'd like
When running applications on systems with cgroups v2 enabled, for example on docker, container.id should be filled in for events produced by APM agents.

Additional context
The current metrics spec touches on collecting cgroups v2 metrics without specific guidance on how to identify the cgroup itself, that should be updated as well. The java and python agents may provide insight into the updates required, like consulting /proc/self/mountinfo instead of /proc/self/cgroup when cgroups v2 are detected.

@trentm
Copy link
Member

trentm commented Oct 14, 2021

https://stackoverflow.com/questions/68816329/how-to-get-docker-container-id-from-within-the-container-with-cgroup-v2 discusses using upperdir=(.+?) in an entry in /proc/self/mountinfo. That may be limited (my vague, perhaps obsolete, recollection from earlier Docker days was that OverlayFS wasn't always the file driver). It also provides an ID that is different than Docker's container ID.

iovisor/bcc#1119 discusses how there isn't a kernel concept of container ID, so this likely comes down to heuristics specific to each container runtime (docker, k8s, podman, systemd, etc.) ... or just being out of luck if nothing is exposed inside the container.

Gil, you mentioned perhaps having assist from a host-local APM server.

What breaks when a container.id is missing? Can hostname be a (poor) fallback?

@graphaelli
Copy link
Member Author

graphaelli commented Oct 14, 2021

assist from a host-local APM server.

Good point, I'm not sure how that would work but it is worth considering if, as you wrote, the id is not reliably discoverable from within the container.

What breaks when a container.id is missing?

Workflows based on pivoting on that data are impacted. For example, viewing application service logs either are not shown or scoped only to the host/node level which may (likely!) be running various unrelated containers - sometimes useful, but usually you want to start at container and zoom out to that level if needed. That's a really simple example but I hope it demonstrates that type of issue missing this information causes.

@graphaelli
Copy link
Member Author

Reminded me this is still a problem

image

@graphaelli
Copy link
Member Author

One workaround for those coming across this issue is to start containers with --cgroupns=host - I've confirmed container.id is picked up under cgroupsv2 with docker using that option. That's not available via docker compose yet - tracked in compose-spec/compose-spec#148

@Nacoma
Copy link

Nacoma commented Mar 7, 2023

This is reportedly an issue with at least three of the APM agents so far, with 2/3 waiting for a decision in this thread before taking any action.

What breaks when a container.id is missing?

  • Infrastructure inventory is polluted by the containers that are incorrectly reported as hosts by the agent.
  • Association between actual hosts and traces are lost.

@trentm
Copy link
Member

trentm commented Mar 27, 2023

The current state of the art (StackOverflow, Jenkins, OpenTelemetry JS) seems to be to read and parse /proc/self/mountinfo for the container ID -- as I saw back in Oct 2021.

opencontainers/runtime-spec#1105 seems to be a/the issue to follow for there eventually/possibly being a standardize mechanism for this. Until then, we should update our spec to fallback to parsing /proc/self/mountinfo.

@SylvainJuge
Copy link
Member

On OpenTelemetry Java side, cgroups v2 container ID is currently implemented by parsing /proc/self/mountinfo

There is no mention of pod ID however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants