Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow-2.11.0-foss-2022a-CUDA-11.7.0.eb #20150

Open
bernipuig opened this issue Mar 19, 2024 · 1 comment
Open

TensorFlow-2.11.0-foss-2022a-CUDA-11.7.0.eb #20150

bernipuig opened this issue Mar 19, 2024 · 1 comment

Comments

@bernipuig
Copy link

bernipuig commented Mar 19, 2024

When I try to compile AlphaFold-2.3.4-foss-2022a-CUDA-11.7.0-ColabFold.eb when I install the TensorFlow dependency I get the following error. If I try to install TensorFlow-2.11.0-foss-2022a-CUDA-11.7.0.eb, I get the same error. It seems to be a problem of the ld_wrapper that doesn't find the application ar etc....
If I go to the folder "/tmp/eb-p7v_88bc/tmpbsizsqfc/rpath_wrappers/ld_wrapper/" there is only this ld

Easyubuild 4.8.2 version

PWD=/proc/self/cwd
PYTHONNOUSERSITE=1
PYTHONPATH=/shared/software/easybuild/x86_64/software/TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/networkx/2.8.4-foss-2022a/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/protobuf-python/3.19.4-GCCcore-11.3.0/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/flatbuffers/2.0.7-GCCcore-11.3.0/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/dill/0.3.6-GCCcore-11.3.0/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/h5py/3.7.0-foss-2022a/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/SciPy-bundle/2022.05-foss-2022a/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/pybind11/2.9.2-GCCcore-11.3.0/lib/python3.10/site-packages:/shared/software/easybuild/x86_64/software/Python/3.10.4-GCCcore-11.3.0/easybuild/python
PYTHON_BIN_PATH=/shared/software/easybuild/x86_64/software/Python/3.10.4-GCCcore-11.3.0/bin/python
PYTHON_LIB_PATH=/shared/software/easybuild/x86_64/software/TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages
TF2_BEHAVIOR=1
TF_CUBLAS_VERSION=11.10.1
TF_CUDA_COMPUTE_CAPABILITIES=7.5
TF_CUDA_PATHS=/shared/software/easybuild/x86_64/software/CUDA/11.7.0
TF_CUDA_VERSION=11.7
TF_CUDNN_VERSION=8.4.1
TF_NCCL_VERSION=2.12.12
TF_SYSTEM_LIBS=absl_py,astor_archive,astunparse_archive,boringssl,com_google_protobuf,curl,cython,dill_archive,double_conversion,flatbuffers,functools32_archive,gast_archive,gif,hwloc,icu,jsoncpp_git,libjpeg_turbo,lmdb,nasm,nsync,opt_einsum_archive,org_sqlite,pasta,png,pybind11,six_archive,snappy,tblib_archive,termcolor_archive,typing_extensions_archive,wrapt,zlib
/tmp/eb-p7v_88bc/tmpbsizsqfc/rpath_wrappers/ld_wrapper/ar @bazel-out/k8-opt/bin/external/com_google_absl/absl/hash/liblow_level_hash.pic.a-2.params)
==# Configuration: 67a8ea0106922cbf5001f49e5a2e5014bbf7f43dab5714724497ab171ad44040
==# Execution platform: @local_execution_config_platform//:platform
src/main/tools/process-wrapper-legacy.cc:80: "execvp(/tmp/eb-p7v_88bc/tmpbsizsqfc/rpath_wrappers/ld_wrapper/ar, ...)": No such file or directory
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 42.853s, Critical Path: 1.60s
INFO: 148 processes: 144 internal, 4 local.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
(at easybuild/common/software/EasyBuild/4.8.2/lib/python3.6/site-packages/easybuild/tools/run.py:681 in parse_cmd_output)
== 2024-03-19 12:43:25,077 build_log.py:267 INFO ... (took 50 secs)
== 2024-03-19 12:43:25,078 build_log.py:267 INFO ... (took 8 mins 2 secs)
== 2024-03-19 12:43:25,078 config.py:695 DEBUG software install path as specified by 'installpath' and 'subdir_software': /shared/software/easybuild/x86_64/software
== 2024-03-19 12:43:25,079 filetools.py:2012 INFO Removing lock /shared/software/easybuild/x86_64/software/.locks/_shared_software_easybuild_x86_64_software_TensorFlow_2.11.0-foss-2022a-CUDA-11.7.0.lock...
== 2024-03-19 12:43:25,080 filetools.py:383 INFO Path /shared/software/easybuild/x86_64/software/.locks/_shared_software_easybuild_x86_64_software_TensorFlow_2.11.0-foss-2022a-CUDA-11.7.0.lock successfully removed.
== 2024-03-19 12:43:25,080 filetools.py:2016 INFO Lock removed: /shared/software/easybuild/x86_64/software/.locks/_shared_software_easybuild_x86_64_software_TensorFlow_2.11.0-foss-2022a-CUDA-11.7.0.lock
== 2024-03-19 12:43:25,081 easyblock.py:4277 WARNING build failed (first 300 chars): cmd " bazel --output_user_root=/dev/shm/easybuild/TensorFlow/2.11.0/foss-2022a-CUDA-11.7.0/TensorFlow/bazel-root --local_startup_timeout_secs=300 --host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m build --config=noaws --config=nogcp --config=nohdfs --compilation_mode=opt --config=opt --subcommands -
== 2024-03-19 12:43:25,081 easyblock.py:328 INFO Closing log for application name TensorFlow version 2.11.0

@casparvl
Copy link
Contributor

Hm, I don't think the ar command is supposed to be copied to the rpath_wrappers tempdir. This is the code that is responsible for putting the RPATH wrappers in place. I'd say only compilers and linkers are supposed to be wrapped. My bet is that TensorFlow just assumes that the ar command is in the same directory where it found e.g. gcc or ld or something. That would be wrong: which gcc (or which ld) would return the wrapper, whereas the ar command isn't in that same directory.

It's a bit puzzling, because on our system, we have this TensorFlow installed, and we do also use RPATH support. So it did work at some point... Ours was installed with EasyBuild 4.7.0. You might want to see if either the EasyConfig or the EasyBlock changed since then, and if any of those changes might have caused this issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants