WIP: Fixes to compile CUDA with nvc++ #910

forrestglines · 2023-07-15T16:56:17Z

PR Summary

Quick fix to the cmake to allow compiling CUDA code with the nvc++. Although in past experience nvc++ hasn't worked for Kokkos based codes, supposedly it will become NVIDIA's favored compiler and nvcc deprecated.

While nvcc is likely preferred, nvc++ and the nvhpc stack turned out to be easier to compile AthenaPK on Chicoma at the moment.

PR Checklist

Code passes cpplint
New features are documented.
Adds a test for any bugs fixed. Adds tests for new features.
Code is formatted
Changes are summarized in CHANGELOG.md
CI has been triggered on Darwin for performance regression tests.
Docs build
(@lanl.gov employees) Update copyright on changed files

BenWibking · 2023-07-15T21:01:28Z

Just for reference, it appears this is only currently supported on Cray systems at the moment: https://gitlab.kitware.com/cmake/cmake/-/issues/23003.

Yurlungur · 2023-07-16T22:57:59Z

Before merging this, I'd like to try compiling Phoebus with nvhpc with this fix on chicoma. Previously, I found that simply removing the relaxed constexpr flag did not allow me to compile with nvc++ but maybe things have changed

forrestglines · 2023-07-17T15:28:22Z

I'm using these modules to build AthenaPK and Parthenon on Chicoma. AthenaPK seems to work correctly across multiple GPU nodes. Parthenon mostly works but isn't passing all tests.

Here's my modules:

module purge
module load PrgEnv-nvhpc
module load nvhpc/22.3  cray-mpich/8.1.21 craype-accel-nvidia80
module load cmake/3.22.3 python/3.10-anaconda-2023.03  cray-hdf5-parallel/1.12.2.1
export MPICH_GPU_SUPPORT_ENABLED=1

And I'm compiling Parthenon with

cmake \
  -D CMAKE_BUILD_TYPE=RelWithDebInfo \
  -D Kokkos_ENABLE_CUDA=ON -D Kokkos_ENABLE_CUDA_LAMBDA=ON \
  -D Kokkos_ARCH_AMPERE80=ON \
  -D SERIAL_WITH_MPIEXEC=ON -D TEST_MPIEXEC=srun \
  -D CMAKE_CXX_FLAGS="${PE_MPICH_GTL_DIR_nvidia80} ${PE_MPICH_GTL_LIBS_nvidia80}" \
  $SOURCE_DIR

While AthenaPK compiles and passes all tests with this setup, Parthenon compiles but does not pass all ctests

The following tests FAILED:
         16 - Device Object Allocation (Subprocess aborted)
         30 - MeshData works as expected for simple packs (Subprocess aborted)
         33 - Swarm memory management (Subprocess aborted)
         49 - regression_test:particle_leapfrog (Failed)
         50 - regression_mpi_test:particle_leapfrog (Failed)
         51 - regression_test:particle_leapfrog_outflow (Failed)
         52 - regression_mpi_test:particle_leapfrog_outflow (Failed)
         58 - regression_mpi_test:advection_convergence (Failed)
         59 - regression_test:output_hdf5 (Failed)
         60 - regression_mpi_test:output_hdf5 (Failed)
         63 - regression_test:bvals (Failed)
         64 - regression_mpi_test:bvals (Failed)
Errors while running CTest

I haven't looked too thoroughly into why these tests fail.

BenWibking · 2023-07-17T18:25:52Z

I am not able to build this with NVHPC 23.5. I get:

NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp)
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp)
NVC++/x86-64 Linux 23.5-0: compilation completed with severe errors
make[2]: *** [parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/flux_correction.cpp.o] Error 2
make[2]: *** Waiting for unfinished jobs....
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/boundary_communication.cpp)
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/boundary_communication.cpp)
NVC++/x86-64 Linux 23.5-0: compilation completed with severe errors
make[2]: *** [parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/boundary_communication.cpp.o] Error 2

I built this with:
cmake .. -DKokkos_ARCH_AMPERE80=ON -DKokkos_ENABLE_CUDA=ON -DCMAKE_C_COMPILER=nvc -DCMAKE_CXX_COMPILER=$HOME/bin/CC -DCMAKE_CUDA_COMPILER=$HOME/bin/CC -DCMAKE_CUDA_COMPILER_ID=NVHPC

with a custom wrapper script to get it working on a non-Cray system:

#!/bin/bash
nvc++ -cuda -gpu=cc80 --gcc-toolchain=$(which gcc) "$@"

BenWibking · 2023-07-17T18:27:18Z

There are lots of warnings like this:

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 84: warning: function "lambda []" captures local object "koff" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
              Kokkos::TeamThreadRange<>(team_member, idxer.size()), [&](const int idx) {
                                                                       ^

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 180: warning: function "lambda []" captures local object "b" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
                               [&](const int idx) {
                                  ^

forrestglines · 2023-07-17T18:34:44Z

There are lots of warnings like this:

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 84: warning: function "lambda []" captures local object "koff" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
              Kokkos::TeamThreadRange<>(team_member, idxer.size()), [&](const int idx) {
                                                                       ^

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 180: warning: function "lambda []" captures local object "b" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
                               [&](const int idx) {
                                  ^

I get a lot of those warnings too but AthenaPK compiles and runs just fine

BenWibking · 2023-07-17T19:04:26Z

Ok, the "illegal use of void type" looks like it's an internal compiler error in NVHPC 23.5 (https://forums.developer.nvidia.com/t/nvc-openmp-error-inside-llc/178950).

BenWibking · 2023-07-17T19:28:37Z

I get a lot of those warnings too but AthenaPK compiles and runs just fine

With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug.

I didn't try to run the Parthenon tests.

BenWibking · 2023-07-17T20:04:27Z

I get a lot of those warnings too but AthenaPK compiles and runs just fine

With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug.

I didn't try to run the Parthenon tests.

I take this back. When running one of the AthenaPK problems (the rand_blast problem), I get different numerical behavior, such that it crashes with a positivity failure at timestep 664:

Mon Jul 17 14:48:42 CDT 2023:  cycle=664 time=6.2577120848689291e-03 dt=1.0307579117559152e-05 zone-cycles/wsec_step=2.75e+08 wsec_total=8.08e+00 wsec_step=7.61e-03
Mon Jul 17 14:48:42 CDT 2023:  ### PARTHENON ERROR
Mon Jul 17 14:48:42 CDT 2023:    Condition:   w_p > 0.0 || pressure_floor_ > 0.0 || e_floor_ > 0.0
Mon Jul 17 14:48:42 CDT 2023:    Message:     Got negative pressure. Consider enabling first-order flux correction or setting a reasonble pressure or temperature floor.
Mon Jul 17 14:48:42 CDT 2023:    File:        /projects/cvz/bwibking/athenapk-nvhpc-test/src/eos/../eos/adiabatic_glmmhd.hpp
Mon Jul 17 14:48:42 CDT 2023:    Line number: 116

I don't think this compiler is ready for production use.

forrestglines · 2023-07-17T23:11:06Z

I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable

BenWibking · 2023-10-13T18:51:16Z

I tried again with NVHPC 23.9 and get an internal compiler error when compiling bnd_info.cpp:

[ 13%] Building CXX object parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/bnd_info.cpp.o
"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp", line 309: warning: variable "idx" was declared but never referenced [declared_but_not_referenced]
        int idx = static_cast<int>(el) % 3;
            ^

Remark: individual warnings can be suppressed with "--diag_suppress <warning-name>"

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp", line 349: warning: variable "idx" was declared but never referenced [declared_but_not_referenced]
      int idx = static_cast<int>(el) % 3;
          ^

NVC++-F-0000-Internal compiler error. size of unknown type       0  (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp)
NVC++/x86-64 Linux 23.9-0: compilation aborted

BenWibking · 2023-10-15T18:29:00Z

I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable

It turns out that nvc++ does not preserve denormals by default (whereas nvcc does). This may explain the weird numerical behavior we were both seeing. You can force it to preserve denormals with the nvc++ compiler option: -⁠Mnodaz.

BenWibking · 2023-11-19T18:33:54Z

@forrestglines @Yurlungur Even though nvc++ -cuda can't currently compile Parthenon, I would be in favor of merging this in, since this PR will eventually be necessary once the compiler works.

forrestglines · 2023-11-19T20:54:47Z

@forrestglines @Yurlungur Even though nvc++ -cuda can't currently compile Parthenon, I would be in favor of merging this in, since this PR will eventually be necessary once the compiler works.

Sounds good to me. The changes are minimal and shouldn't affect any other compiler.

BenWibking

Looks good to me.

I think it should be considered a compiler bug that it doesn't accept a variable named restrict, since the usage of restrict as a reserved keyword is only part of the ISO C standard, and not part of the ISO C++ standard at all. However, changing it avoids confusion in any case.

Yurlungur

I'm willing to merge this, but it does make me a little uncomfortable, see below.

Yurlungur · 2023-11-20T03:33:35Z

example/sparse_advection/sparse_advection_driver.cpp

@@ -122,12 +122,12 @@ TaskCollection SparseAdvectionDriver::MakeTaskCollection(BlockList_t &blocks,
                             mdudt.get(), beta * dt, mc1.get());

    // do boundary exchange
-    auto restrict =
+    auto restrict_task =


seems totally reasonable to change this

Yurlungur · 2023-11-20T03:34:27Z

src/CMakeLists.txt

+# make sure the flag is not added when compiling with Clang for Cuda. The flag
+# is also not necessary (and not recognized) when using nvc++ to compile Cuda
+# code
+if (Kokkos_ENABLE_CUDA AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "NVHPC")


This makes me a little nervous, as it may cause problems with downstream codes.

Maybe this condition could be changed to check whether CMAKE_CUDA_COMPILER_ID is not equal to NVHPC or Clang instead?

Sure---that seems like it might be safer. My main worry here is will the jury-rigged insanity that I (and KHARMA) are currently using to build with nvhpc fail?

Hmm, I don't know. I'm not sure how that works at all without removing this flag, unless you mean you're using nvc++ as the host compiler and nvcc as the device compiler? In that case, I think it should still report NVIDIA as the CUDA compiler, rather than NVHPC.

I don't think so. I feel like we've discussed this before. The option --expt-relaxed-constexpr is specific to the nvcc compiler, it's not an option for nvc++. I'm double checking that nvc++ can compile CUDA code that needs constexpr.

Does CMAKE_CUDA_COMPILER_ID get set to NVHPC or CLANG? The current commit also only changes the behavior of nvc++ builds, I don't think we want to change the behavior of clang builds.

If you use clang as the device compiler, it doesn't accept this option either (it doesn't need it). So IMO it might be good to keep that.

I'm not sure if you can force nvc++ to use nvcc as the CUDA compiler, nvc++ already includes the CUDA toolchain.

I also verified that nvc++ compiles and runs code that needs nvcc --expt-relaxed-constexpr.

By default, at least on non-Cray systems, it uses nvcc as the CUDA compiler. It only uses nvc++ for device code when nvc++ -cuda (or nvc++ -stdpar or nvc++ -openacc) is used.

Forrest Glines added 2 commits July 15, 2023 10:07

Skip --expt-relaxed-constexpr with nvc++/nvhpc

2bbce53

Rename restrict to non C-keyword

81ba353

Merge branch 'develop' into forrestglines/cuda-with-nvc++-fix

dacc2a1

forrestglines requested review from Yurlungur and BenWibking November 19, 2023 20:55

BenWibking approved these changes Nov 19, 2023

View reviewed changes

Yurlungur approved these changes Nov 20, 2023

View reviewed changes

Merge branch 'develop' into forrestglines/cuda-with-nvc++-fix

8657717

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Fixes to compile CUDA with nvc++ #910

WIP: Fixes to compile CUDA with nvc++ #910

forrestglines commented Jul 15, 2023

BenWibking commented Jul 15, 2023

Yurlungur commented Jul 16, 2023 •

edited

forrestglines commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

forrestglines commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

forrestglines commented Jul 17, 2023

BenWibking commented Oct 13, 2023 •

edited

BenWibking commented Oct 15, 2023

BenWibking commented Nov 19, 2023

forrestglines commented Nov 19, 2023

BenWibking left a comment •

edited

Yurlungur left a comment

Yurlungur Nov 20, 2023

Yurlungur Nov 20, 2023

BenWibking Nov 20, 2023 •

edited

Yurlungur Nov 20, 2023

BenWibking Nov 20, 2023

forrestglines Nov 20, 2023

BenWibking Nov 20, 2023

forrestglines Nov 20, 2023

BenWibking Nov 20, 2023

WIP: Fixes to compile CUDA with nvc++ #910

Are you sure you want to change the base?

WIP: Fixes to compile CUDA with nvc++ #910

Conversation

forrestglines commented Jul 15, 2023

PR Summary

PR Checklist

BenWibking commented Jul 15, 2023

Yurlungur commented Jul 16, 2023 • edited

forrestglines commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

forrestglines commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

forrestglines commented Jul 17, 2023

BenWibking commented Oct 13, 2023 • edited

BenWibking commented Oct 15, 2023

BenWibking commented Nov 19, 2023

forrestglines commented Nov 19, 2023

BenWibking left a comment • edited

Choose a reason for hiding this comment

Yurlungur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenWibking Nov 20, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yurlungur commented Jul 16, 2023 •

edited

BenWibking commented Oct 13, 2023 •

edited

BenWibking left a comment •

edited

BenWibking Nov 20, 2023 •

edited