Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish NVIDIA ubi-9 version of container-toolkit docker image #412

Open
faiq opened this issue Mar 15, 2024 · 7 comments
Open

Publish NVIDIA ubi-9 version of container-toolkit docker image #412

faiq opened this issue Mar 15, 2024 · 7 comments

Comments

@faiq
Copy link

faiq commented Mar 15, 2024

Hey folks,

I am requesting that we get a container-toolkit docker image published for ubi-9 based systems. From my understanding this would be straight forward by adding a commit like this to create targets for ubi9 05dd438. The base image seems like it's available on ngc nvcr.io/nvidia/cuda:12.3.2-base-ubi9

This would enable us to run gpu-operator on os like rocky-9, because it seems like there is a dependency on the host NVIDIA/gpu-operator#72 (comment)

Let me know if I can start working on it.

@cdesiniotis
Copy link
Contributor

@faiq as a matter of interest, have you tried running a ubi8 tag of the toolkit image on Rocky 9?

@faiq
Copy link
Author

faiq commented Mar 15, 2024

I have not. I can give it a try today.

@elezar
Copy link
Member

elezar commented Mar 15, 2024

@faiq the main reason to select a different image is the GLIBC version on the host. In the case of rocky8, I would assume that any of the image that we publish should be compatible. Otherwise, I would recommend the ubi8 image as @cdesiniotis suggests.

If you're seeing specific failures when using this, please let us know.

@faiq
Copy link
Author

faiq commented Mar 16, 2024

@elezar im trying to target rocky9 not 8

@faiq
Copy link
Author

faiq commented Mar 17, 2024

Using containers it looks like the versions for glibc are off, and I think this would likely not work.

This is the result of ldd --version for a rocky:9 container

01:06 PM  faiqus @ archlinux  ~  ⏎ 130   
$ docker run -v $(pwd):$(pwd) -w $(pwd)  -it   --entrypoint /bin/sh rockylinux:9
Unable to find image 'rockylinux:9' locally
9: Pulling from library/rockylinux
489e1be6ce56: Downloading [==========================================>       489e1be6ce56: Pull complete 
Digest: sha256:c944604c0c759f5d164ffbdf0bbab2fac582b739938937403c067ab634a0518a
Status: Downloaded newer image for rockylinux:9
sh-5.1# ldd --version
ldd (GNU libc) 2.34
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
sh-5.1# 
exit

this is the result for running ldd --version on the ubi8 toolkit version gives us

$ docker run -v $(pwd):$(pwd) -w $(pwd)  -it   --entrypoint /bin/sh nvcr.io/nvidia/k8s/container-toolkit:v1.14.6-ubi8
Unable to find image 'nvcr.io/nvidia/k8s/container-toolkit:v1.14.6-ubi8' locally
v1.14.6-ubi8: Pulling from nvidia/k8s/container-toolkit
Digest: sha256:59a3875e7a37eb370385e654184efa3a1b193c9ea352165818496b19cbe14aa4
Status: Downloaded newer image for nvcr.io/nvidia/k8s/container-toolkit:v1.14.6-ubi8
sh-4.4# ldd --version
ldd (GNU libc) 2.28
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

There's a version mismatch and I think I would get an error similar to the one described in the issue NVIDIA/gpu-operator#72

It looks like these are built on the base cuda images are available and the ldd --version gives us the version of glibc as found in rocky.

$ docker run -v $(pwd):$(pwd) -w $(pwd)  -it   --entrypoint /bin/sh nvcr.io/nvidia/cuda:12.3.2-base-ubi9
Digest: sha256:4ac64f369699d5816a1f7fe5f88cb90f116dd67a4396b86c72823c1357f81fc0
Status: Downloaded newer image for nvcr.io/nvidia/cuda:12.3.2-base-ubi9
sh-5.1# ldd --version
ldd (GNU libc) 2.34
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

It seems like the process to add this OS support is straight forward by adding some Dockerfiles and Make targets. Is this correct, and can I work on this?

Thanks.

@elezar
Copy link
Member

elezar commented Mar 18, 2024

The issue is the MINIMUM glibc versions. As long as the glibc version on the host is newer than that in the container, the NVIDIA Container Toolkit is expected to work

@faiq
Copy link
Author

faiq commented Mar 18, 2024

I see, it wasn't communicated before that this was a minimum version requirement from the previous comments.

I'm going to give it a try using the ubi-8 container image. To see if it works today.

However, it's still very confusing to have this toolkit image which map to the actual hosts binaries will get copied on to. It lead me to believe that matching the os major version in this case ubi9 to rocky9 was a strict requirement to getting it to work.

Is it possible to consolidate these images now into just one? There was talk about doing static linking as a solution. However, since a lot of older OSes (centos7.9/rhel7.9) are being deprecated in just a few months would the minimum glibc version available in the ubuntu toolkit container be compatible with all the other systems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants