Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda: add module #422

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

bobvanderlinden
Copy link
Contributor

As discussed on Discord, this configuration is needed to run pytorch in devenv on Linux. It was confirmed to work.

I don't have much knowledge of CUDA itself, so I'm unsure what other libraries exactly need. I did find that CUDA_HOME and CUDA_PATH are used by tensorflow.

Confirmations that this works for real projects are welcome!

@domenkozar
Copy link
Member

We'll need to add toolkit folder to top-level.nix imports.

Is this ever supposed to work on macOS?

@bobvanderlinden
Copy link
Contributor Author

We'll need to add toolkit folder to top-level.nix imports.

👍

Is this ever supposed to work on macOS?

I don't think so, but not sure 😅
https://developer.nvidia.com/nvidia-cuda-toolkit-11_6_0-developer-tools-mac-hosts
Apparently it can be used remotely, so yes the toolkit itself does support macOS. I'm kindof doubting that works for pytorch as it needs libcuda.so which is in the x11 package.

@domenkozar
Copy link
Member

Let's add an assertion then if !pkgs.stdenv.isLinux.

Signed-off-by: Bob van der Linden <bobvanderlinden@gmail.com>
@bobvanderlinden
Copy link
Contributor Author

Requires #383, as cuda is unfree.

@domenkozar
Copy link
Member

Would we want to incorporate any feedback from NixOS/nixpkgs#217780 (comment)? Leaving it for the future is also fine :)

@SomeoneSerge
Copy link

Is this ever supposed to work on macOS?

Is CUDA available on MacOS?
Darwin hasn't been included in meta.platforms for most of cudaPackages, but it's mostly that people focus on Linux

@bobvanderlinden
Copy link
Contributor Author

Hmm, this PR is not really ready. It probably shouldn't support MacOS if cuda in nixpkgs doesn't support it. On Discord it was mentioned this method did work for pytorch, but because of LD_LIBRARY_PATH change of pkgs.gcc-unwrapped the Rust compiler breaks down.

I think LD_LIBRARY_PATH is mostly to workaround a pytorch problem and not so much a CUDA problem.

Using the pytorch package from nixpkgs (and thus the nixpkgs cuda maintainers) doesn't play nicely with poetry (pyproject.toml), so there is not a perfect solution yet.

I am interested in looking into this further to get a good setup for cuda + pytorch + rust, but it's not high on my todo list atm.

I can leave this PR in draft to keep the discussion for devenv centralized, but I can also open a new issue if that's more appropriate.

@tfmoraes
Copy link

Adding /run/opengl-driver/lib to $LD_LIBRARY_PATH makes CUDA work for me:

> python -c "import torch; print(torch.cuda.is_available())"
True

@bobvanderlinden
Copy link
Contributor Author

Adding /run/opengl-driver/lib to $LD_LIBRARY_PATH makes CUDA work for me:

> python -c "import torch; print(torch.cuda.is_available())"
True

I think that is a fix that should be in NixOS. It makes no sense for other distros nor MacOS. Because of that, I'm not sure whether it should be in devenv.

@SomeoneSerge
Copy link

I think that is a fix that should be in NixOS. @bobvanderlinden

There's no need for that on NixOS: as long as you use a nix-built pytorch, /run/opengl-driver/lib would already be in the binaries' Runpaths

@bobvanderlinden
Copy link
Contributor Author

Indeed. However, most people want to use pytorch from poetry (having it be part of pyproject.toml). When doing so, you'll run into the LD_LIBRARY_PATH problem, but only on NixOS. Other systems have the OpenGL driver libraries (like Cuda) globally available.

The best of both worlds might be to use poetry2nix instead of poetry itself to make all poetry-defined packages available as Nix packages. That way the torch package can be overridden to link to a different Cuda library explicitly.

It avoids having the need for LD_LIBRARY_PATH as well as pkgs.gcc-unwrapped.lib.

It does have its own downside in that it probably will not work very nicely with poetry cli commands like poetry add. Haven't given this a try yet though, might not be so bad when using direnv properly.

@SomeoneSerge
Copy link

SomeoneSerge commented Apr 7, 2023

Indeed. However, most people want to use pytorch from poetry (having it be part of pyproject.toml)

Dunno, I haven't seen these people 😆

Other systems have the OpenGL driver libraries (like Cuda) globally available

This is not exactly correct. Most other systems do indeed merge all libraries into one location. But the reason their pytorch manages to discover e.g. libcudart.so and through the libcuda.so, is that their python has in its header .interp set to a system-specific path like /lib64/ld-linux-x86-64.so (or something), and that linker is configured to look at /etc/ld.so.conf (or something) which is also a system-specific path. And that ld.so.conf will specifically enumerate system-specific paths like /lib, /usr/lib, and /opt/some-nonsense/cuda/lib. In other words, their libcuda.so is as "globally available" as ours. Having that said, maybe we could make integration easier, at risk of occasionally facing some library version mismatches

@lizelive
Copy link

Dunno, I haven't seen these people

i use torch from poetry most of time because cuda libs are in pypi now and fewest number of package mangers the better.

@lizelive
Copy link

also because ml packages are updating so fast it's not viable to do with nix system packages

@domenkozar
Copy link
Member

Could we somehow detect if opengl stuff is wired up and error out with a nice message what to do?

enable = lib.mkEnableOption "CUDA toolkit";

package = lib.mkOption {
type = lib.types.package;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes CUDA can't come from Nix, so we'll have to allow a way to setup FHS in those cases.

It's tricky to get this right (and won't work on macOS), but it's often required.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes CUDA can't come from Nix,

Interesting. Any specific examples in mind?

package = lib.mkOption {
type = lib.types.package;
description = "Which package of cuda toolkit to use.";
default = pkgs.cudatoolkit;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this attribute is almost unmaintained, it's better to use the splayed packages

env.LD_LIBRARY_PATH = lib.mkIf pkgs.stdenv.isLinux (
lib.makeLibraryPath [
pkgs.gcc-unwrapped.lib
pkgs.linuxPackages.nvidia_x11

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. How is this to be synchronized with config.boot.kernelPackages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants