-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rootless Docker CDI Injection: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all: unknown. #434
Comments
The error:
Indidates that rootless docker cannot find the CDI specifications that were generated. As far as I am aware, rootless docker modifies the path used for Since you're using a docker version that supports CDI (as an opt-in feature, I believe). Could you try the native CDI injection here. Running:
and restarting the docker daemon should enable this feature. (Note that the command may need to be adjusted for rootless mode to specify the config file path explicitly as per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#rootless-mode). Then with the CDI feature enabled in docker you should be able to run:
and have the devices injected without using the |
Hey @elezar, thank you for taking the time! CDI injection seems to be a mainline feature in Docker 26.0.0. though it is till experimental, it no longer requires the user to set DOCKER_CLI_EXPERIMENTAL, as was the case in 25.x. The native injection worked on rootful after configuring the daemon as suggested, though the rootless Docker still runs into issues as listed below. Before applying the suggested configurations I tested the following on rootless:
After applying the configuration with
Restarting Docker and testing the CDI injections again lead to the following regardless of c-group setting:
I checked the location for the configurations for both docker clients: rootless (click to expand)
rootful (click to expand)
Both point to:
However, it looks like nothing was created under
The Docker docs for enabling CDI devices suggest manually setting the spec location, but it does not seem to make a difference in this case.
|
Could you try generate (or copy) a CDI spec to /var/run/cdi in addition to /etc/cdi and see if this fixes the rootless case. |
I copied the yaml to
|
I think the key is the following: https://github.com/moby/moby/blob/8599f2a3fb884afcbbf1471ec793fbcbc327cd35/cmd/dockerd/docker.go#L65C1-L72C1 I would assume that for the docker daemon running with the rootless kit, the path where it is trying to resolve the CDI device specifications is not It may be sufficient to copy the spec file to a location that is readable by the daemon to confirm. Note that plugins are also handled differently for rootless mode: https://github.com/moby/moby/blob/8599f2a3fb884afcbbf1471ec793fbcbc327cd35/pkg/plugins/discovery_unix.go#L11 |
I wonder if this implies that the "correct" location for rootless is |
I just tested @klueska idea, by copying the yaml to
With this change, the native CDI injection does indeed run on rootless.
|
It's good to know there is a path to making this work. I'd be interested to know if these are the "default" locations if you remove |
I would be surprised if this is the case since iirc we explicitly set |
You can see the
The choice of |
That seems like a bug that should be filed against moby/docker then. |
It might also be worth including in the documentation for the CDI, that a rootless Docker client requires the yaml to be generated/moved to a location the daemon has access to, wherever that may end up being. |
Hello everyone,
we have recently set up a rootless docker instance alongside our existing docker on one of our servers, but ran into issues mounting host GPUs into the rootless containers. A workaround was presented in issue #85 (toggling no-cgroups to switch between rootful and rootless) with a mention of a better solution in the form of Nvidia CDI coming as an experimental feature in Docker 25.
After updating to the newest Docker releases and setting up CDI, our regular Docker instance behaved as we expected based on the documentation, but the rootless instance still runs into issues.
Setup to reproduce:
config.toml (click to expand)
nvidia.yaml (click to expand)
The issue:
When
no-cgroups = false
CDI injection works fine for the regular Docker instance:but produces the following errors for the rootless version:
Running
docker run --rm --gpus all ubuntu nvidia-smi
results in the same error as without OCI. This seems to be consistent across all variations listed on the Specialized Configurations for Docker page:Interestingly, setting
no-cgroups = true
disables the regular use of GPUs with rootful Docker:but still allows for CDI injections:
With control groups disabled, the rootless daemon is able to use exposed GPUs as outlined in the Docker docs:
TLDR
Disabling c-groups allows the rootless containers to use exposed GPUs using the regular docker run --gpus flag. This in turn disables the rootful container's GPU access. Leaving control groups enabled reverses the effect, as outlined in #85 .
Disabling c-groups and using Nvidia CDI, the rootful Docker can still use GPU injection, even though regular GPU access is barred, while the rootless container uses the exposed GPUs. CDI injection for rootless fails in both cases, however.
This seems like a definite improvement, but I'm not sure it's intended behavior. The CDI injection failing with rootless regardless of control group setting leads me to believe this is unintended, unless rootless is not yet supported by Nvidia CDI.
Any insights would be greatly appreciated!
The text was updated successfully, but these errors were encountered: