Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Fixes to compile CUDA with nvc++ #910

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

forrestglines
Copy link
Collaborator

PR Summary

Quick fix to the cmake to allow compiling CUDA code with the nvc++. Although in past experience nvc++ hasn't worked for Kokkos based codes, supposedly it will become NVIDIA's favored compiler and nvcc deprecated.

While nvcc is likely preferred, nvc++ and the nvhpc stack turned out to be easier to compile AthenaPK on Chicoma at the moment.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@BenWibking
Copy link
Collaborator

Just for reference, it appears this is only currently supported on Cray systems at the moment: https://gitlab.kitware.com/cmake/cmake/-/issues/23003.

@Yurlungur
Copy link
Collaborator

Yurlungur commented Jul 16, 2023

Before merging this, I'd like to try compiling Phoebus with nvhpc with this fix on chicoma. Previously, I found that simply removing the relaxed constexpr flag did not allow me to compile with nvc++ but maybe things have changed

@forrestglines
Copy link
Collaborator Author

I'm using these modules to build AthenaPK and Parthenon on Chicoma. AthenaPK seems to work correctly across multiple GPU nodes. Parthenon mostly works but isn't passing all tests.

Here's my modules:

module purge
module load PrgEnv-nvhpc
module load nvhpc/22.3  cray-mpich/8.1.21 craype-accel-nvidia80
module load cmake/3.22.3 python/3.10-anaconda-2023.03  cray-hdf5-parallel/1.12.2.1
export MPICH_GPU_SUPPORT_ENABLED=1

And I'm compiling Parthenon with

cmake \
  -D CMAKE_BUILD_TYPE=RelWithDebInfo \
  -D Kokkos_ENABLE_CUDA=ON -D Kokkos_ENABLE_CUDA_LAMBDA=ON \
  -D Kokkos_ARCH_AMPERE80=ON \
  -D SERIAL_WITH_MPIEXEC=ON -D TEST_MPIEXEC=srun \
  -D CMAKE_CXX_FLAGS="${PE_MPICH_GTL_DIR_nvidia80} ${PE_MPICH_GTL_LIBS_nvidia80}" \
  $SOURCE_DIR

While AthenaPK compiles and passes all tests with this setup, Parthenon compiles but does not pass all ctests

The following tests FAILED:
         16 - Device Object Allocation (Subprocess aborted)
         30 - MeshData works as expected for simple packs (Subprocess aborted)
         33 - Swarm memory management (Subprocess aborted)
         49 - regression_test:particle_leapfrog (Failed)
         50 - regression_mpi_test:particle_leapfrog (Failed)
         51 - regression_test:particle_leapfrog_outflow (Failed)
         52 - regression_mpi_test:particle_leapfrog_outflow (Failed)
         58 - regression_mpi_test:advection_convergence (Failed)
         59 - regression_test:output_hdf5 (Failed)
         60 - regression_mpi_test:output_hdf5 (Failed)
         63 - regression_test:bvals (Failed)
         64 - regression_mpi_test:bvals (Failed)
Errors while running CTest

I haven't looked too thoroughly into why these tests fail.

@BenWibking
Copy link
Collaborator

I am not able to build this with NVHPC 23.5. I get:

NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp)
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp)
NVC++/x86-64 Linux 23.5-0: compilation completed with severe errors
make[2]: *** [parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/flux_correction.cpp.o] Error 2
make[2]: *** Waiting for unfinished jobs....
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/boundary_communication.cpp)
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/boundary_communication.cpp)
NVC++/x86-64 Linux 23.5-0: compilation completed with severe errors
make[2]: *** [parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/boundary_communication.cpp.o] Error 2

I built this with:
cmake .. -DKokkos_ARCH_AMPERE80=ON -DKokkos_ENABLE_CUDA=ON -DCMAKE_C_COMPILER=nvc -DCMAKE_CXX_COMPILER=$HOME/bin/CC -DCMAKE_CUDA_COMPILER=$HOME/bin/CC -DCMAKE_CUDA_COMPILER_ID=NVHPC

with a custom wrapper script to get it working on a non-Cray system:

#!/bin/bash
nvc++ -cuda -gpu=cc80 --gcc-toolchain=$(which gcc) "$@"

@BenWibking
Copy link
Collaborator

There are lots of warnings like this:

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 84: warning: function "lambda []" captures local object "koff" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
              Kokkos::TeamThreadRange<>(team_member, idxer.size()), [&](const int idx) {
                                                                       ^

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 180: warning: function "lambda []" captures local object "b" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
                               [&](const int idx) {
                                  ^

@forrestglines
Copy link
Collaborator Author

There are lots of warnings like this:

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 84: warning: function "lambda []" captures local object "koff" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
              Kokkos::TeamThreadRange<>(team_member, idxer.size()), [&](const int idx) {
                                                                       ^

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 180: warning: function "lambda []" captures local object "b" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
                               [&](const int idx) {
                                  ^

I get a lot of those warnings too but AthenaPK compiles and runs just fine

@BenWibking
Copy link
Collaborator

Ok, the "illegal use of void type" looks like it's an internal compiler error in NVHPC 23.5 (https://forums.developer.nvidia.com/t/nvc-openmp-error-inside-llc/178950).

@BenWibking
Copy link
Collaborator

I get a lot of those warnings too but AthenaPK compiles and runs just fine

With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug.

I didn't try to run the Parthenon tests.

@BenWibking
Copy link
Collaborator

I get a lot of those warnings too but AthenaPK compiles and runs just fine

With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug.

I didn't try to run the Parthenon tests.

I take this back. When running one of the AthenaPK problems (the rand_blast problem), I get different numerical behavior, such that it crashes with a positivity failure at timestep 664:

Mon Jul 17 14:48:42 CDT 2023:  cycle=664 time=6.2577120848689291e-03 dt=1.0307579117559152e-05 zone-cycles/wsec_step=2.75e+08 wsec_total=8.08e+00 wsec_step=7.61e-03
Mon Jul 17 14:48:42 CDT 2023:  ### PARTHENON ERROR
Mon Jul 17 14:48:42 CDT 2023:    Condition:   w_p > 0.0 || pressure_floor_ > 0.0 || e_floor_ > 0.0
Mon Jul 17 14:48:42 CDT 2023:    Message:     Got negative pressure. Consider enabling first-order flux correction or setting a reasonble pressure or temperature floor.
Mon Jul 17 14:48:42 CDT 2023:    File:        /projects/cvz/bwibking/athenapk-nvhpc-test/src/eos/../eos/adiabatic_glmmhd.hpp
Mon Jul 17 14:48:42 CDT 2023:    Line number: 116

I don't think this compiler is ready for production use.

@forrestglines
Copy link
Collaborator Author

I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable

@BenWibking
Copy link
Collaborator

BenWibking commented Oct 13, 2023

I tried again with NVHPC 23.9 and get an internal compiler error when compiling bnd_info.cpp:

[ 13%] Building CXX object parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/bnd_info.cpp.o
"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp", line 309: warning: variable "idx" was declared but never referenced [declared_but_not_referenced]
        int idx = static_cast<int>(el) % 3;
            ^

Remark: individual warnings can be suppressed with "--diag_suppress <warning-name>"

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp", line 349: warning: variable "idx" was declared but never referenced [declared_but_not_referenced]
      int idx = static_cast<int>(el) % 3;
          ^

NVC++-F-0000-Internal compiler error. size of unknown type       0  (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp)
NVC++/x86-64 Linux 23.9-0: compilation aborted

@BenWibking
Copy link
Collaborator

I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable

It turns out that nvc++ does not preserve denormals by default (whereas nvcc does). This may explain the weird numerical behavior we were both seeing. You can force it to preserve denormals with the nvc++ compiler option: -⁠Mnodaz.

@BenWibking
Copy link
Collaborator

@forrestglines @Yurlungur Even though nvc++ -cuda can't currently compile Parthenon, I would be in favor of merging this in, since this PR will eventually be necessary once the compiler works.

@forrestglines
Copy link
Collaborator Author

@forrestglines @Yurlungur Even though nvc++ -cuda can't currently compile Parthenon, I would be in favor of merging this in, since this PR will eventually be necessary once the compiler works.

Sounds good to me. The changes are minimal and shouldn't affect any other compiler.

Copy link
Collaborator

@BenWibking BenWibking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I think it should be considered a compiler bug that it doesn't accept a variable named restrict, since the usage of restrict as a reserved keyword is only part of the ISO C standard, and not part of the ISO C++ standard at all. However, changing it avoids confusion in any case.

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to merge this, but it does make me a little uncomfortable, see below.

@@ -122,12 +122,12 @@ TaskCollection SparseAdvectionDriver::MakeTaskCollection(BlockList_t &blocks,
mdudt.get(), beta * dt, mc1.get());

// do boundary exchange
auto restrict =
auto restrict_task =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems totally reasonable to change this

# make sure the flag is not added when compiling with Clang for Cuda. The flag
# is also not necessary (and not recognized) when using nvc++ to compile Cuda
# code
if (Kokkos_ENABLE_CUDA AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "NVHPC")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me a little nervous, as it may cause problems with downstream codes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this condition could be changed to check whether CMAKE_CUDA_COMPILER_ID is not equal to NVHPC or Clang instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure---that seems like it might be safer. My main worry here is will the jury-rigged insanity that I (and KHARMA) are currently using to build with nvhpc fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know. I'm not sure how that works at all without removing this flag, unless you mean you're using nvc++ as the host compiler and nvcc as the device compiler? In that case, I think it should still report NVIDIA as the CUDA compiler, rather than NVHPC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I feel like we've discussed this before. The option --expt-relaxed-constexpr is specific to the nvcc compiler, it's not an option for nvc++. I'm double checking that nvc++ can compile CUDA code that needs constexpr.

Does CMAKE_CUDA_COMPILER_ID get set to NVHPC or CLANG? The current commit also only changes the behavior of nvc++ builds, I don't think we want to change the behavior of clang builds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use clang as the device compiler, it doesn't accept this option either (it doesn't need it). So IMO it might be good to keep that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you can force nvc++ to use nvcc as the CUDA compiler, nvc++ already includes the CUDA toolchain.

I also verified that nvc++ compiles and runs code that needs nvcc --expt-relaxed-constexpr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, at least on non-Cray systems, it uses nvcc as the CUDA compiler. It only uses nvc++ for device code when nvc++ -cuda (or nvc++ -stdpar or nvc++ -openacc) is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants