Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using custom thrust repo #105

Open
garymm opened this issue Jun 2, 2023 · 14 comments
Open

Using custom thrust repo #105

garymm opened this issue Jun 2, 2023 · 14 comments

Comments

@garymm
Copy link
Contributor

garymm commented Jun 2, 2023

I'm trying to use the latest Thrust release and running into issues.
I can reproduce this in the examples of this repo. See here: https://github.com/garymm/rules_cuda/tree/0b47488/examples

It seems that for some reason the CUDA toolkit's thrust is used, even though the -isystem for my custom thrust comes first on the nvcc command.

To reproduce:

# First, install any CUDA toolkit at or less than version 12.1
# Then
git clone https://github.com/garymm/rules_cuda.git
cd rules_cuda
git switch garymm/custom-thrust
cd examples
bazel build -s //thrust:version_test

produces:

➜  bazel build -s //thrust:version_test                
SUBCOMMAND: # //thrust:version_test_cu [action 'Compiling thrust/version_test.cu', configuration: c92f7800966ee88d4403d82e9237650892518949abe87c60f2a9a688842ef3b8, execution platform: @local_config_platform//:host]
(cd /home/garymm/.cache/bazel/_bazel_garymm/40b8e895884343ff3f99cfc29b0e57bd/execroot/rules_cuda_examples && \
  exec env - \
    PATH=/usr/bin \
  /usr/local/cuda/bin/nvcc -x cu -Xcompiler -fPIC -ccbin /usr/bin/gcc -I . -I bazel-out/k8-fastbuild/bin -I external/thrust -I bazel-out/k8-fastbuild/bin/external/thrust -I external/cub -I bazel-out/k8-fastbuild/bin/external/cub -I external/local_cuda -I bazel-out/k8-fastbuild/bin/external/local_cuda -isystem external/thrust -isystem bazel-out/k8-fastbuild/bin/external/thrust -isystem external/cub -isystem bazel-out/k8-fastbuild/bin/external/cub -isystem external/local_cuda/cuda/include -isystem bazel-out/k8-fastbuild/bin/external/local_cuda/cuda/include -Xcompiler -g1 -c thrust/version_test.cu -o bazel-out/k8-fastbuild/bin/thrust/_objs/version_test_cu/version_test.pic.o --expt-relaxed-constexpr --expt-extended-lambda)
# Configuration: c92f7800966ee88d4403d82e9237650892518949abe87c60f2a9a688842ef3b8
# Execution platform: @local_config_platform//:host
ERROR: /home/garymm/src/bazel-contrib/rules_cuda/examples/thrust/BUILD.bazel:17:13: Compiling thrust/version_test.cu failed: (Exit 2): nvcc failed: error executing command (from target //thrust:version_test_cu) /usr/local/cuda/bin/nvcc -x cu -Xcompiler -fPIC -ccbin /usr/bin/gcc -I . -I bazel-out/k8-fastbuild/bin -I external/thrust -I bazel-out/k8-fastbuild/bin/external/thrust -I external/cub -I ... (remaining 25 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
thrust/version_test.cu(4): error: static assertion failed
  static_assert((200001 / 100 % 1000) == 1);```

This is on Ubuntu, with CUDA 12.1 installed here:

➜  ls /usr/local/cuda*
/usr/local/cuda@  /usr/local/cuda-12@

/usr/local/cuda-12.1
@cloudhan
Copy link
Collaborator

cloudhan commented Jun 2, 2023

See #38 (comment).

Won't fix if you want to upgrade the bundled version to newer version. In that case, you need to roll out your own thrust target with repository rule.

@cloudhan cloudhan closed this as completed Jun 2, 2023
@cloudhan cloudhan reopened this Jun 2, 2023
@cloudhan
Copy link
Collaborator

cloudhan commented Jun 2, 2023

@cloudhan
Copy link
Collaborator

cloudhan commented Jun 2, 2023

The CHAR_MIN issue seems to come from missing climits header. https://en.cppreference.com/w/cpp/types/climits

@cloudhan
Copy link
Collaborator

cloudhan commented Jun 2, 2023

The root cause is https://github.com/garymm/rules_cuda/blob/d66063672a7e3ea397f40a841b2411e0e4d1ef11/examples/WORKSPACE.bazel#LL33C11-L33C11
in your code. And this file screwed everything https://github.com/NVIDIA/thrust/blob/main/thrust/limits.h.

You owe me a cup of coffee for debugging this. 😈

@garymm
Copy link
Contributor Author

garymm commented Jun 2, 2023

Removing includes does indeed fix it.
I'm a little confused as to why users of the library are able to use system-include syntax without it though. I.e. #include <thrust/device_vector.h>. I thought the angle-bracket includes required -isystem and -isystem is only set if the cc_library has includes set?

@cloudhan
Copy link
Collaborator

cloudhan commented Jun 2, 2023

Because there is <cuda_tookit_root>/include/thrust. The system include path of <cuda_tookit_root>/include is transitively added due to deps chain @thrust -> @cub -> @rules_cuda//:cuda -> @rules_cuda//:cuda_headers

@garymm
Copy link
Contributor Author

garymm commented Jun 2, 2023

If I change it to includes = [""], which I think is correct, the toolchain's local_cuda includes clobber my custom thrust's includes.

You can see the behavior here: https://github.com/garymm/rules_cuda/tree/3b973fa/examples

So I think @rules_cuda//:cuda_headers should not include the thrust and cub headers. Would you be open to a PR removing them from the cuda_headers target?

@cloudhan
Copy link
Collaborator

cloudhan commented Jun 2, 2023

Reasonable. Would you like to contribute a PR for this? All you need to change is exclude thrust and cub from the glob and add two seperate libraries for cub and thrust.

filegroup(
name = "_cuda_header_files",
srcs = glob(["cuda/include/**"]),
visibility = ["//visibility:private"],
)
cc_library(
name = "cuda_headers",
hdrs = [":_cuda_header_files"],
includes = ["cuda/include"],
)

@garymm
Copy link
Contributor Author

garymm commented Jun 2, 2023

I just tried that and it doesn't work for some reason... will investigate.

@cloudhan
Copy link
Collaborator

cloudhan commented Jun 2, 2023

OK, this will be tricky because the cuda toolkit include will be symlinked by bazel into the sandbox. Then all sub directory will be accessible...

@garymm
Copy link
Contributor Author

garymm commented Jun 2, 2023

I updated the issue description with the latest status.
I'm really not sure what's going wrong. I tried using the -M and -H options for the compiler, and I also tried passing --generate-dependencies-with-compile to nvcc, and in all cases it seems to suggest that my custom thrust's header should be the on that's included, but the actually compiled code uses the CUDA toolkit bundled version.

When I run the exact command printed out by bazel build -s, it seems to work. So maybe there is some environment variable or something that is set that is causing the include directory of the toolkit to be searched in a different order than what is suggested on the command line?

@garymm
Copy link
Contributor Author

garymm commented Jun 2, 2023

Hmm, indeed there's something about the sandbox that changes the behavior. Building with --sandbox_strategy=local works.

@garymm
Copy link
Contributor Author

garymm commented Jun 2, 2023

And for some reason removing includes = ["."] from the thrust and cub rules works, even though the resulting change in the nvcc command seems like the opposite of what I would expect to want.

@garymm
Copy link
Contributor Author

garymm commented Jun 5, 2023

I can contribute an example that uses custom thrust and cub with no includes = line and then close this, but I'm a bit at a loss as to why the includes breaks things. If you have some idea please LMK. Or LMK if you want me to just contribute the example and then close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants