Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal compiler error with Cuda/11.4 GCC/10.3.0 #4334

Closed
pgrete opened this issue Sep 15, 2021 · 23 comments
Closed

Internal compiler error with Cuda/11.4 GCC/10.3.0 #4334

pgrete opened this issue Sep 15, 2021 · 23 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Compiler Issue An issue that Kokkos cannot / should not fix; Kokkos must communicate to relevant vendor Will Not Fix An issue that the Kokkos Team cannot / will not address

Comments

@pgrete
Copy link
Contributor

pgrete commented Sep 15, 2021

@kurtsansom reported this for our downstream code parthenon-hpc-lab/parthenon#585 (comment) and I was able to confirm for a plain Kokkos build using current develop:

$ cmake       -DKokkos_ENABLE_CUDA:BOOL=ON   ..
-- Setting default Kokkos CXX standard to 14
-- The CXX compiler identification is GNU 10.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/software/GCCcore/10.3.0/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'RelWithDebInfo' as none was specified.
-- Setting policy CMP0074 to use <Package>_ROOT variables
-- The project name is: Kokkos
-- Compiler Version: 11.4.48
-- kokkos_launch_compiler (/mnt/home/gretephi/src/kokkos/bin/kokkos_launch_compiler) is enabled...
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=c++14 for C++14 standard as feature
-- CUDA auto-detection of architecture failed with /opt/software/GCCcore/10.3.0/bin/c++. Enabling CUDA language ONLY to auto-detect architecture...
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /opt/software/CUDAcore/11.4.0/bin/nvcc
-- The CUDA compiler identification is NVIDIA 11.4.48
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/software/CUDAcore/11.4.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detected CUDA Compute Capability 70
-- Setting Kokkos_ARCH_VOLTA70=ON
-- Built-in Execution Spaces:
--     Device Parallel: Kokkos::Cuda
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
-- 
-- Architectures:
--  VOLTA70
-- Found CUDAToolkit: /opt/software/CUDAcore/11.4.0/include (found version "11.4.48") 
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found TPLCUDA: TRUE  
-- Found TPLLIBDL: /usr/lib64/libdl.so  
-- Kokkos Devices: CUDA;SERIAL, Kokkos Backends: CUDA;SERIAL
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/home/gretephi/src/kokkos/build-cuda
[19:03:21][gretephi@dev-amd20-v100: ~/src/kokkos/build-cuda]
$ make 
[  3%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o
[  7%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono:473:154:   required from here
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono:428:27: internal compiler error: Segmentation fault
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
      |                           ^~~~~~
0xcbb3ff crash_signal
	../../gcc/toplev.c:328
0x7b259d tsubst(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:15310
0x7c5596 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:13225
0x7bdf96 tsubst_aggr_type
	../../gcc/cp/pt.c:13428
0x7c827f tsubst_function_decl
	../../gcc/cp/pt.c:13816
0x7bec39 tsubst_decl
	../../gcc/cp/pt.c:14267
0x7acc21 tsubst_copy
	../../gcc/cp/pt.c:16512
0x7b051a tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:20707
0x7af076 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7af076 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19896
0x7ae4ad tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7ae4ad tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19588
0x7ae476 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7ae476 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19587
0x7c0a54 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7c0a54 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../../gcc/cp/pt.c:18886
0x7c5596 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:13225
0x7bdf96 tsubst_aggr_type
	../../gcc/cp/pt.c:13428
0x7ade97 tsubst_qualified_id
	../../gcc/cp/pt.c:16215
0x7afbbd tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19625
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make[2]: *** [core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o] Error 1
make[1]: *** [core/src/CMakeFiles/kokkoscore.dir/all] Error 2
make: *** [all] Error 2

Environment:

$ module list

Currently Loaded Modules:
  1) GCC/10.3.0       3) zlib/1.2.11       5) CUDAcore/11.4.0   7) bzip2/1.0.8   9) cURL/7.76.0       11) CMake/3.20.1
  2) GCCcore/10.3.0   4) binutils/2.36.1   6) ncurses/6.2       8) OpenSSL/1.1  10) libarchive/3.5.1

$ lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.8.2003 (Core)
Release:	7.8.2003
Codename:	Core

Using GCC 10.2 works just fine.

@ajpowelsnl ajpowelsnl added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Compiler Issue An issue that Kokkos cannot / should not fix; Kokkos must communicate to relevant vendor labels Sep 20, 2021
@crtrott
Copy link
Member

crtrott commented Sep 20, 2021

I think that bug just comes from using Chrono itself?

@ajpowelsnl ajpowelsnl self-assigned this Sep 20, 2021
@ajpowelsnl
Copy link
Contributor

ajpowelsnl commented Sep 20, 2021

Simple reproducer:

#include in a source (.cpp) file

nvcc -x cu source.cpp

@ajpowelsnl
Copy link
Contributor

ajpowelsnl commented Sep 21, 2021

@ajpowelsnl will contact Max Katz, Matt Stack to tell them about the bug, find out which other gcc / cuda combinations fail in this manner , & to file a bug report, if necessary.

@dalg24
Copy link
Member

dalg24 commented Sep 21, 2021

You probably meant to tag @maxpkatz

@maxpkatz
Copy link

This issue has already been reported upstream to gcc (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100102) and is reported there as fixed in 10.4 and 11.0+. But since it's present in gcc 10.3, and it seems to be always exposed when using gcc as nvcc's host compiler, I think that basically means gcc 10.3 is unuseable with CUDA for Kokkos (or any other application that uses std::chrono).

@ajpowelsnl
Copy link
Contributor

This issue has already been reported upstream to gcc (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100102) and is reported there as fixed in 10.4 and 11.0+. But since it's present in gcc 10.3, and it seems to be always exposed when using gcc as nvcc's host compiler, I think that basically means gcc 10.3 is unuseable with CUDA for Kokkos (or any other application that uses std::chrono).

Many thanks for the info, @maxpkatz. It sounds like the "fix" is to not use gcc-10.3. I'll check with @crtrott on this point.

@crtrott
Copy link
Member

crtrott commented Sep 21, 2021

put it on the agenda for tomorrow. We could in theory avoid using chrono for this combo, and fall back to what we had before we used chrono (i.e. timespec structs).

@masterleinad
Copy link
Contributor

@pgrete Can you please check that #4348 fixes this issue?

@pgrete
Copy link
Contributor Author

pgrete commented Sep 24, 2021

Looks like the error is still there:

$ make
[  3%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono:473:154:   required from here
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono:428:27: internal compiler error: Segmentation fault
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
      |                           ^~~~~~
0xcbb3ff crash_signal
	../../gcc/toplev.c:328
0x7b259d tsubst(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:15310
0x7c5596 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:13225
0x7bdf96 tsubst_aggr_type
	../../gcc/cp/pt.c:13428
0x7c827f tsubst_function_decl
	../../gcc/cp/pt.c:13816
0x7bec39 tsubst_decl
	../../gcc/cp/pt.c:14267
0x7acc21 tsubst_copy
	../../gcc/cp/pt.c:16512
0x7b051a tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:20707
0x7af076 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7af076 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19896
0x7ae4ad tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7ae4ad tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19588
0x7ae476 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7ae476 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19587
0x7c0a54 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7c0a54 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../../gcc/cp/pt.c:18886
0x7c5596 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:13225
0x7bdf96 tsubst_aggr_type
	../../gcc/cp/pt.c:13428
0x7ade97 tsubst_qualified_id
	../../gcc/cp/pt.c:16215
0x7afbbd tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19625
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make[2]: *** [core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o] Error 1
make[1]: *** [core/src/CMakeFiles/kokkoscore.dir/all] Error 2
make: *** [all] Error 2
[05:03:24][gretephi@dev-amd20-v100: ~/src/kokkos/build-fix]
$ git status
# HEAD detached at masterleinad/timer_fallback_gcc1030_nvcc
nothing to commit, working directory clean

@masterleinad
Copy link
Contributor

Hmmm... I'm having problems reproducing this. I'm getting

# cmake -DKokkos_ENABLE_CUDA=ON ..
-- Setting default Kokkos CXX standard to 14
-- The CXX compiler identification is GNU 10.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++-10 - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'RelWithDebInfo' as none was specified.
-- Setting policy CMP0074 to use <Package>_ROOT variables
-- The project name is: Kokkos
-- Compiler Version: 11.4.120
-- kokkos_launch_compiler (/tmp/kokkos/bin/kokkos_launch_compiler) is enabled...
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=c++14 for C++14 standard as feature
-- CUDA auto-detection of architecture failed with /usr/bin/g++-10. Enabling CUDA language ONLY to auto-detect architecture...
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 11.4.120
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
make-- Detected CUDA Compute Capability 70
-- Setting Kokkos_ARCH_VOLTA70=ON
-- Built-in Execution Spaces:
--     Device Parallel: Kokkos::Cuda
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
-- 
-- Architectures:
--  VOLTA70
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.4.120") 
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found TPLCUDA: TRUE  
-- Found TPLLIBDL: /usr/lib/x86_64-linux-gnu/libdl.so  
-- Kokkos Devices: CUDA;SERIAL, Kokkos Backends: CUDA;SERIAL
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/kokkos/build_gcc_10_3
root@fetnat03:/tmp/kokkos/build_gcc_10_3# make
Scanning dependencies of target kokkoscore
[  3%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o
[  7%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o
[ 11%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Error.cpp.o
[ 14%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_ExecPolicy.cpp.o
[ 18%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostBarrier.cpp.o
[ 22%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace.cpp.o
[ 25%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace_deepcopy.cpp.o
[ 29%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostThreadTeam.cpp.o
[ 33%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_MemoryPool.cpp.o
[ 37%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_MemorySpace.cpp.o
[ 40%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_NumericTraits.cpp.o
[ 44%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Profiling.cpp.o
[ 48%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Serial.cpp.o
[ 51%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Serial_Task.cpp.o
[ 55%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_SharedAlloc.cpp.o
[ 59%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Spinwait.cpp.o
[ 62%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Stacktrace.cpp.o
[ 66%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_hwloc.cpp.o
[ 70%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/Cuda/Kokkos_CudaSpace.cpp.o
[ 74%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/Cuda/Kokkos_Cuda_Instance.cpp.o
[ 77%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/Cuda/Kokkos_Cuda_Locks.cpp.o
[ 81%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/Cuda/Kokkos_Cuda_Task.cpp.o
[ 85%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/desul/src/Lock_Array_CUDA.cpp.o
[ 88%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/desul/src/Lock_Array_HIP.cpp.o
[ 92%] Linking CXX static library libkokkoscore.a
[ 92%] Built target kokkoscore
Scanning dependencies of target kokkoscontainers
[ 96%] Building CXX object containers/src/CMakeFiles/kokkoscontainers.dir/impl/Kokkos_UnorderedMap_impl.cpp.o
[100%] Linking CXX static library libkokkoscontainers.a
[100%] Built target kokkoscontainers

@pgrete
Copy link
Contributor Author

pgrete commented Sep 27, 2021

I just tried again from scratch and it's still fails:

[08:05:16][gretephi@dev-amd20-v100: ~/src]
$ git clone -b timer_fallback_gcc1030_nvcc https://github.com/masterleinad/kokkos.git kokkos-gcc1030
Cloning into 'kokkos-gcc1030'...
remote: Enumerating objects: 77211, done.
remote: Counting objects: 100% (127/127), done.
remote: Compressing objects: 100% (96/96), done.
remote: Total 77211 (delta 77), reused 55 (delta 31), pack-reused 77084
Receiving objects: 100% (77211/77211), 21.32 MiB | 12.93 MiB/s, done.
Resolving deltas: 100% (63317/63317), done.
[08:06:29][gretephi@dev-amd20-v100: ~/src]
$ cd kokkos-gcc1030
[08:06:36][gretephi@dev-amd20-v100: ~/src/kokkos-gcc1030]
$ cmake -Bbuild -DKokkos_ENABLE_CUDA=ON .
-- Setting default Kokkos CXX standard to 14
-- The CXX compiler identification is GNU 10.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/software/GCCcore/10.3.0/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'RelWithDebInfo' as none was specified.
-- Setting policy CMP0074 to use <Package>_ROOT variables
-- The project name is: Kokkos
-- Compiler Version: 11.4.48
-- kokkos_launch_compiler (/mnt/home/gretephi/src/kokkos-gcc1030/bin/kokkos_launch_compiler) is enabled...
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=c++14 for C++14 standard as feature
-- CUDA auto-detection of architecture failed with /opt/software/GCCcore/10.3.0/bin/c++. Enabling CUDA language ONLY to auto-detect architecture...
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /opt/software/CUDAcore/11.4.0/bin/nvcc
-- The CUDA compiler identification is NVIDIA 11.4.48
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/software/CUDAcore/11.4.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detected CUDA Compute Capability 70
-- Setting Kokkos_ARCH_VOLTA70=ON
-- Built-in Execution Spaces:
--     Device Parallel: Kokkos::Cuda
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
-- 
-- Architectures:
--  VOLTA70
-- Found CUDAToolkit: /opt/software/CUDAcore/11.4.0/include (found version "11.4.48") 
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found TPLCUDA: TRUE  
-- Found TPLLIBDL: /usr/lib64/libdl.so  
-- Kokkos Devices: CUDA;SERIAL, Kokkos Backends: CUDA;SERIAL
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/home/gretephi/src/kokkos-gcc1030/build
[08:07:40][gretephi@dev-amd20-v100: ~/src/kokkos-gcc1030]
$ cmake --build build
[  3%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o
[  7%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono:473:154:   required from here
/opt/software/GCCcore/10.3.0/include/c++/10.3.0/chrono:428:27: internal compiler error: Segmentation fault
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
      |                           ^~~~~~
0xcbb3ff crash_signal
	../../gcc/toplev.c:328
0x7b259d tsubst(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:15310
0x7c5596 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:13225
0x7bdf96 tsubst_aggr_type
	../../gcc/cp/pt.c:13428
0x7c827f tsubst_function_decl
	../../gcc/cp/pt.c:13816
0x7bec39 tsubst_decl
	../../gcc/cp/pt.c:14267
0x7acc21 tsubst_copy
	../../gcc/cp/pt.c:16512
0x7b051a tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:20707
0x7af076 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7af076 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19896
0x7ae4ad tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7ae4ad tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19588
0x7ae476 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7ae476 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19587
0x7c0a54 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19274
0x7c0a54 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../../gcc/cp/pt.c:18886
0x7c5596 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
	../../gcc/cp/pt.c:13225
0x7bdf96 tsubst_aggr_type
	../../gcc/cp/pt.c:13428
0x7ade97 tsubst_qualified_id
	../../gcc/cp/pt.c:16215
0x7afbbd tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../../gcc/cp/pt.c:19625
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
gmake[2]: *** [core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o] Error 1
gmake[1]: *** [core/src/CMakeFiles/kokkoscore.dir/all] Error 2
gmake: *** [all] Error 2

I noticed that the cuda version on our machine is slightly older. I wonder if that could be related (unfortunately, this is a shared cluster so there's no easy update path I could follow).

@masterleinad
Copy link
Contributor

masterleinad commented Sep 27, 2021

Can you check if commenting #include <chrono> in core/src/impl/Kokkos_ClockTic.hpp helps? AFAICT, that's the only other file that includes chrono and it might not be necessary for the architecture you are interested in.

@ajpowelsnl
Copy link
Contributor

Hi @masterleinad --

To reproduce, I just imported the header (i.e., "#include ) in a file, and compiled with this command:

nvcc -x cu <source_file.cpp>

You see the error during compilation.

Here is my environment:

[ajpowel@kokkos-dev-2 kokkos]$ module list
Currently Loaded Modulefiles:

  1. sems-env 2) sems-cuda/11.1 3) sems-cmake/3.19.1 4) sems-openmpi/4.0.2 5) sems-gcc/9.2.0 6) /clang/8.0

@masterleinad
Copy link
Contributor

masterleinad commented Sep 28, 2021

@ajpowelsnl Since you can reproduce the issue, can you check if commenting #include <chrono> in core/src/impl/Kokkos_ClockTic.hpp fixes the error for you?

@masterleinad
Copy link
Contributor

@pgrete Since you can reproduce the issue, can you check if commenting #include <chrono> in core/src/impl/Kokkos_ClockTic.hpp fixes the error for you?

@ajpowelsnl
Copy link
Contributor

ajpowelsnl commented Sep 28, 2021

@pgrete Since you can reproduce the issue, can you check if commenting #include <chrono> in core/src/impl/Kokkos_ClockTic.hpp fixes the error for you?

@pgrete and @masterleinad -- commenting out "#include " in the file you mention does appear to fix the issue. What would you like for me to do next?

@pgrete
Copy link
Contributor Author

pgrete commented Sep 29, 2021

$ git diff
diff --git a/core/src/impl/Kokkos_ClockTic.hpp b/core/src/impl/Kokkos_ClockTic.hpp
index 4e46b8d..9085314 100644
--- a/core/src/impl/Kokkos_ClockTic.hpp
+++ b/core/src/impl/Kokkos_ClockTic.hpp
@@ -47,7 +47,7 @@
 
 #include <Kokkos_Macros.hpp>
 #include <stdint.h>
-#include <chrono>
+//#include <chrono>
 #ifdef KOKKOS_ENABLE_OPENMPTARGET
 #include <omp.h>
 #endif

The error still persists (fails with an identical error message), but I'm also not sure where chrono is picked up.
It's also included in ./algorithms/unit_tests/TestRandom.hpp but that should not be picked up.

@ajpowelsnl
Copy link
Contributor

Morning @pgrete - I will bring this issue up for discussion again with @dalg24 and @crtrott today. Many thanks for the additional info.

@ckhroulev
Copy link

The error still persists (fails with an identical error message), but I'm also not sure where chrono is picked up.

Same here.

Looks like <chrono> is included by (at least) <mutex> (directly and via <bits/unique_lock.h>) and <thread>. As far as I can tell both <mutex> and <thread> are included by multiple Kokkos headers.

(I ran make VERBOSE=1 to extract the compiler command, added -E -dI to it, then inspected the output.)

@masterleinad
Copy link
Contributor

NVIDIA/nccl#494 mentions some workarounds to chrono but at this point I believe there is not much we can do about it if simply implicitly including chrono triggers this problem. I would propose to just blacklist gcc-10.3 for nvcc.

@aprokop
Copy link
Collaborator

aprokop commented Nov 2, 2021

Observed internal compiler error last week on Perlmutter with gcc/10.3.0 and cuda/11.3.0 when trying to compile ArborX. Can't recall the exact message right now.

@Spudz76
Copy link

Spudz76 commented Nov 24, 2021

CUDA Toolkit 11.4.0 clearly states gcc-9 is maximum supported.

CUDA Toolkit 11.4.1 supports up to gcc-11. Therefore the inability to reproduce on 11.4.1 while the original post was on 11.4.0 (which does not support gcc-10 or gcc-11 yet).

@crtrott crtrott added the Will Not Fix An issue that the Kokkos Team cannot / will not address label Dec 8, 2021
@crtrott
Copy link
Member

crtrott commented Dec 8, 2021

There is no reasonable workaround for us, and this is not officially supported by NVCC anyway.

@crtrott crtrott closed this as completed Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Compiler Issue An issue that Kokkos cannot / should not fix; Kokkos must communicate to relevant vendor Will Not Fix An issue that the Kokkos Team cannot / will not address
Projects
None yet
Development

No branches or pull requests

9 participants