You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Describe IN DETAIL the feature/behavior/change you would like to see.
Bump the nvidia driver for CUDA 12.1 support.
kops source currently on nvidia-headless-515-server
ubuntu repo latest available version currently 550
We are currently running our 1.26 cluster configured with DriverPackage nvidia-driver-535 (with CUDA 12.0 support).
Note that we moved away from nvidia-headless-XXX-server because it does not install nvidia-smi and some other binaries on the host system, which caused issues for us with some cuda docker images that rely on the host having them available. It's barely any additional disk space used, so that was a quick and effective fix.
When we tried bumping to nvidia-driver-550 on EC2, the nodes stopped registering in the cluster. It was reverted before I could pull logs from the host system I'm afraid. I hope that your CI catches this.
2. Feel free to provide a design supporting your feature request.
The text was updated successfully, but these errors were encountered:
/kind feature
1. Describe IN DETAIL the feature/behavior/change you would like to see.
Bump the nvidia driver for CUDA 12.1 support.
nvidia-headless-515-server
We are currently running our 1.26 cluster configured with DriverPackage
nvidia-driver-535
(with CUDA 12.0 support).Note that we moved away from
nvidia-headless-XXX-server
because it does not installnvidia-smi
and some other binaries on the host system, which caused issues for us with some cuda docker images that rely on the host having them available. It's barely any additional disk space used, so that was a quick and effective fix.When we tried bumping to
nvidia-driver-550
on EC2, the nodes stopped registering in the cluster. It was reverted before I could pull logs from the host system I'm afraid. I hope that your CI catches this.2. Feel free to provide a design supporting your feature request.
The text was updated successfully, but these errors were encountered: