Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot build NVHPC on ubuntu (and workaround) #20375

Open
pescobar opened this issue Apr 16, 2024 · 11 comments
Open

cannot build NVHPC on ubuntu (and workaround) #20375

pescobar opened this issue Apr 16, 2024 · 11 comments

Comments

@pescobar
Copy link
Member

Hi,

I am using easybuild 4.9.1 on ubuntu jammy (22.04) but I couldn't build NVHPC until I patched the easyblock.

With the default easyblock I get this error in the sanity_check_step

== 2024-04-16 16:56:44,636 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/base/exceptions.py:126 in __init__): Sanity check failed: sanity check com
mand cd /tmp/eb-bk_6m8dc/tmp7fpol1sc && nvc++ -std=c++20 minimal.cpp -o minimal exited with code 2 (output: /scicore/soft/easybuild/apps/binutils/2.40/bin/ld: cannot find -ldl: No such file or directory
/scicore/soft/easybuild/apps/binutils/2.40/bin/ld: cannot find -lpthread: No such file or directory
/scicore/soft/easybuild/apps/binutils/2.40/bin/ld: cannot find -lc: No such file or directory
) (at easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/framework/easyblock.py:3669 in _sanity_check_step)

After some debugging I managed to build it patching the easyblock like this (had to add -L/lib/x86_64-linux-gnu)

diff -ru $EBROOTEASYBUILD/lib/python3.10/site-packages/easybuild/easyblocks/n/nvhpc.py /tmp/nvhpc.py
--- /scicore/soft/easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/easyblocks/n/nvhpc.py   2024-04-05 09:30:35.000000000 +0200
+++ /tmp/nvhpc.py       2024-04-16 16:47:59.268066132 +0200
@@ -221,7 +221,7 @@
             # see: https://github.com/easybuilders/easybuild-easyblocks/pull/3240
             tmpdir = tempfile.mkdtemp()
             write_file(os.path.join(tmpdir, 'minimal.cpp'), NVHPC_MINIMAL_EXAMPLE)
-            minimal_compiler_cmd = "cd %s && nvc++ -std=c++20 minimal.cpp -o minimal" % tmpdir
+            minimal_compiler_cmd = "cd %s && nvc++ -L/lib/x86_64-linux-gnu -std=c++20 minimal.cpp -o minimal" % tmpdir
             custom_commands.append(minimal_compiler_cmd)

         super(EB_NVHPC, self).sanity_check_step(custom_paths=custom_paths, custom_commands=custom_commands)

I am not sure how to fix it upstream. If you give me some advice I can send a proper PR

@pescobar
Copy link
Member Author

CCing those who contributed to this easyblock in case they can give any feedback @AndiH @jfgrimm @appolloford @boegel

@jfgrimm
Copy link
Member

jfgrimm commented Apr 16, 2024

hmm, don't think I've seen that issue before. I'll have a go at building on ubuntu 22.04

@jfgrimm
Copy link
Member

jfgrimm commented Apr 16, 2024

I can't reproduce this on any of my Ubuntu 22.04 systems

@cgross95
Copy link
Contributor

We see this exact same issue on Ubuntu 22.04 building NVHPC-23.7-CUDA-12.1.1.eb. We haven't tried the workaround yet. We can also provide any other information from our configuration if it would be helpful for reproducing.

@cgross95
Copy link
Contributor

Here is a test report.

@AndiH
Copy link
Contributor

AndiH commented Apr 17, 2024

I'm really no EB expert, just one of the authors of the EasyBlock.

To me, it sounds like the compiler doesn't look in the right directories for the library objects; i.e. the LD_LIBRARY_PATH may be incomplete. Not sure, though, if this falls into EasyBuild's responsibility…

@pescobar
Copy link
Member Author

@jfgrimm can you try this to check what libraries you link?

$> module purge

$> module load NVHPC/23.7-CUDA-12.3.0

$> cat <<EOF > minimal.cpp
#include <ranges>
int main(){ return 0; }
EOF

$>  nvc++ minimal.cpp -o minimal

$> ldd minimal

This is what I get in my system. The only difference is that I had to compile the binary using nvc++ -L/lib/x86_64-linux-gnu minimal.cpp -o minimal

$> ldd minimal
        linux-vdso.so.1 (0x00007ffdf57f9000)
        libatomic.so.1 => /scicore/soft/easybuild/apps/GCCcore/12.3.0/lib64/libatomic.so.1 (0x00007f1ff6e24000)
        libnvhpcatm.so => /scicore/soft/easybuild/apps/NVHPC/23.7-CUDA-12.3.0/Linux_x86_64/23.7/compilers/lib/libnvhpcatm.so (0x00007f1ff6c00000)
        libstdc++.so.6 => /scicore/soft/easybuild/apps/GCCcore/12.3.0/lib64/libstdc++.so.6 (0x00007f1ff69d8000)
        libnvomp.so => /scicore/soft/easybuild/apps/NVHPC/23.7-CUDA-12.3.0/Linux_x86_64/23.7/compilers/lib/libnvomp.so (0x00007f1ff5800000)
        libnvcpumath.so => /scicore/soft/easybuild/apps/NVHPC/23.7-CUDA-12.3.0/Linux_x86_64/23.7/compilers/lib/libnvcpumath.so (0x00007f1ff5200000)
        libnvc.so => /scicore/soft/easybuild/apps/NVHPC/23.7-CUDA-12.3.0/Linux_x86_64/23.7/compilers/lib/libnvc.so (0x00007f1ff4e00000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1ff4bd8000)
        libgcc_s.so.1 => /scicore/soft/easybuild/apps/GCCcore/12.3.0/lib64/libgcc_s.so.1 (0x00007f1ff69b7000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1ff68d0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1ff6e30000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1ff68cb000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1ff68c6000)

@jfgrimm
Copy link
Member

jfgrimm commented Apr 17, 2024

oh, I just realised that although I installed EB 4.9.1, I was still building with 4.9.0...

I now get the same error 🤦

@jfgrimm
Copy link
Member

jfgrimm commented Apr 17, 2024

I'm guessing that the issue is with the generated localrc. I don't think this is actually on easybuild doing something weird, I get the same issue if I manually run makelocalrc -x -gcc $(which gcc) -gpp $(which g++) -g77 $(which gfortran)

Adding the following line to the generated localrc fixes the issue for me:

set DEFLIBDIR=/lib/x86_64-linux-gnu;

@pescobar
Copy link
Member Author

I noticed this line which is specific to debian systems

but it's not applied for versions < 21 . See here

patching the easyblock like this also workarounds the issue but to be honest I am not sure what would be the right fix

 diff -ru /scicore/soft/easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/easyblocks/n/nvhpc.py ~/tmp/nvhpc.py
--- /scicore/soft/easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/easyblocks/n/nvhpc.py   2024-04-17 11:23:00.892718404 +0200
+++ /scicore/home/scicore/easybuild/tmp/nvhpc.py        2024-04-17 12:51:54.115386938 +0200
@@ -54,9 +54,6 @@
 # contents for siterc file to make PGI/NVHPC pick up $LIBRARY_PATH
 # cfr. https://www.pgroup.com/support/link.htm#lib_path_ldflags
 SITERC_LIBRARY_PATH = """
-# get the value of the environment variable LIBRARY_PATH
-variable LIBRARY_PATH is environment(LIBRARY_PATH);
-
 # split this value at colons, separate by -L, prepend 1st one by -L
 variable library_path is
 default($if($LIBRARY_PATH,-L$replace($LIBRARY_PATH,":", -L)));
@@ -188,12 +185,11 @@
                 if os.path.islink(path):
                     os.remove(path)

-        if LooseVersion(self.version) < LooseVersion('21.3'):
-            # install (or update) siterc file to make NVHPC consider $LIBRARY_PATH
-            siterc_path = os.path.join(compilers_subdir, 'bin', 'siterc')
-            write_file(siterc_path, SITERC_LIBRARY_PATH, append=True)
-            self.log.info("Appended instructions to pick up $LIBRARY_PATH to siterc file at %s: %s",
-                          siterc_path, SITERC_LIBRARY_PATH)
+        # install (or update) siterc file to make NVHPC consider $LIBRARY_PATH
+        siterc_path = os.path.join(compilers_subdir, 'bin', 'siterc')
+        write_file(siterc_path, SITERC_LIBRARY_PATH, append=True)
+        self.log.info("Appended instructions to pick up $LIBRARY_PATH to siterc file at %s: %s",
+                      siterc_path, SITERC_LIBRARY_PATH)

         # The cuda nvvp tar file has broken permissions
         adjust_permissions(self.installdir, stat.S_IWUSR, add=True, onlydirs=True)

@pescobar
Copy link
Member Author

To summarize, the problem is that debian systems need to have folder /usr/lib/x86_64-linux-gnu/ in env var LIBRARY_PATH. This is needed to compile the example in the sanity_check step and also to build any other easyconfig using the nvhpc toolchain

possible solutions are:

  • update the easyblock so that the generated rc file fixes it

  • update the easyblock to use the right LIBRARY_PATH for sanity_check step and define the required env vars in the nvhpc easyconfig .e.g add this to the easyconfigs

allow_append_abs_path = True
modextrapaths = {
    'LD_LIBRARY_PATH':  'Linux_%(arch)s/%(version)s/compilers/extras/qd/lib/',
    'LIBRARY_PATH':  'Linux_%(arch)s/%(version)s/compilers/extras/qd/lib/'
    }

modextrapaths_append = {
    'LD_LIBRARY_PATH':  '/usr/lib/x86_64-linux-gnu',
    'LIBRARY_PATH':  '/usr/lib/x86_64-linux-gnu'
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants