You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upon a user request, I am trying to install LAMMPS-allegro on two different generations of Nvidia GPU nodes; we use Rocky 8 as the OS and the Nvidia driver version 560.x.x:
Nvidia A100 GPU on Intel Icelake node (cuda compute capability is fixed to 8.0)
Nvidia H100 GPU on AMD Zen4 node (hence kokkos_arch='ZEN3' and cuda compute capability is set to 9.0)
In both cases, I get the same compilation error down the road. I am heavily trimming off the error message, but the essence of the issue is:
I have to mention that non-patched installation of exactly the same LAMMPS release with the same toolchain on the same node has went very smoothly. For clarity, I have attached the EasyBuild easyconfig file used for the installation, together with the EasyBuild compilation logfile in the attachment.
Furthermore, you also see the following error occurring too, e.g. when compiling src/force.cpp (see the logfile please):
nvcc_wrapper - *warning* you have set multiple optimization flags (-O*), only the last is used because nvcc can only accept a s
ingle optimization setting.
/vsc-hard-mounts/leuven-apps/rocky8/icelake/2023a/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/avx512fp16
intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
typedef __half __v8hf __attribute__ ((__vector_size__ (16)));
Given that this issue happens only when patching with allegro and eventually building against Kokkos/CUDA, I decided to post it here. I hope this is the right place for it.
Please let me know if any additional information is needed. lammps-torch.tar.gz
The text was updated successfully, but these errors were encountered:
Sorry for the late reply! This issue didn't look fun. The fact that if fails on compiler header files etc. is a bad sign and points to an environment issue.
I don't have a direct answer, but here are a few random thoughts:
I have no experience with EasyBuild. Could you try to just do the usual building sequence interactively?
How do you get your PyTorch? Does it have CXX11 ABI? It looks like your version is 2.1, which is scary. 1.11 works, 1.12&13 don't (at least on NVIDIA), and the early 2.x versions also don't (but not sure exactly which ones). I would recommend using something recent (2.4/2.5).
Setting the CPU arch for Kokkos is unnecessary (won't affect your performance unless you're using a lot of CPU functionality) and complicates things with extra flags.
You're enabling a lot of LAMMPS packages, maybe these affect compiler flags? Try a base version of LAMMPS w/Allegro and no extra packages.
Dear
Upon a user request, I am trying to install LAMMPS-allegro on two different generations of Nvidia GPU nodes; we use Rocky 8 as the OS and the Nvidia driver version 560.x.x:
kokkos_arch='ZEN3'
and cuda compute capability is set to 9.0)In both cases, I get the same compilation error down the road. I am heavily trimming off the error message, but the essence of the issue is:
I have to mention that non-patched installation of exactly the same LAMMPS release with the same toolchain on the same node has went very smoothly. For clarity, I have attached the EasyBuild easyconfig file used for the installation, together with the EasyBuild compilation logfile in the attachment.
Furthermore, you also see the following error occurring too, e.g. when compiling
src/force.cpp
(see the logfile please):Given that this issue happens only when patching with allegro and eventually building against Kokkos/CUDA, I decided to post it here. I hope this is the right place for it.
Please let me know if any additional information is needed.
lammps-torch.tar.gz
The text was updated successfully, but these errors were encountered: