Compilation error with type mismatch, when building with PyTorch and Kokkos #55

moravveji · 2024-10-25T12:31:29Z

Dear

Upon a user request, I am trying to install LAMMPS-allegro on two different generations of Nvidia GPU nodes; we use Rocky 8 as the OS and the Nvidia driver version 560.x.x:

Nvidia A100 GPU on Intel Icelake node (cuda compute capability is fixed to 8.0)
Nvidia H100 GPU on AMD Zen4 node (hence kokkos_arch='ZEN3' and cuda compute capability is set to 9.0)

In both cases, I get the same compilation error down the road. I am heavily trimming off the error message, but the essence of the issue is:

            function "__half::operator unsigned long long() const" (declared at line 250 of /apps/leuven/rocky8/icelake/2023a/s
oftware/CUDA/12.1.1/include/cuda_fp16.hpp)            function "__half::operator bool() const" (declared at line 254 of /apps/leuven/rocky8/icelake/2023a/software/CUDA/1
2.1.1/include/cuda_fp16.hpp)
          __A28, __A29, __A30, __A31 };
                               ^

/vsc-hard-mounts/leuven-apps/rocky8/icelake/2023a/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/avx512fp16
intrin.h(2765): error: argument of type "const __half *" is incompatible with parameter of type "const unsigned *"
    return __builtin_ia32_loadsh_mask (__C, __A, __B);                                       ^

nvcc error   : 'cudafe++' died due to signal 9 (Kill signal)
make[2]: *** [CMakeFiles/lammps.dir/build.make:2981: CMakeFiles/lammps.dir/dev/shm/x0090231/eb/LAMMPS/2Aug2023_update2/foss-2023a-pair_allegro-kokkos-PyTorch-2.1.2-CUDA-12.1.1/lammps-stable_2Aug2023_update2/src/force.cpp.o] Error 9

I have to mention that non-patched installation of exactly the same LAMMPS release with the same toolchain on the same node has went very smoothly. For clarity, I have attached the EasyBuild easyconfig file used for the installation, together with the EasyBuild compilation logfile in the attachment.

Furthermore, you also see the following error occurring too, e.g. when compiling src/force.cpp (see the logfile please):

nvcc_wrapper - *warning* you have set multiple optimization flags (-O*), only the last is used because nvcc can only accept a s
ingle optimization setting.
/vsc-hard-mounts/leuven-apps/rocky8/icelake/2023a/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/avx512fp16
intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
  typedef __half __v8hf __attribute__ ((__vector_size__ (16)));

Given that this issue happens only when patching with allegro and eventually building against Kokkos/CUDA, I decided to post it here. I hope this is the right place for it.

Please let me know if any additional information is needed.
lammps-torch.tar.gz

The text was updated successfully, but these errors were encountered:

anjohan · 2024-11-04T18:25:29Z

Hi,

Sorry for the late reply! This issue didn't look fun. The fact that if fails on compiler header files etc. is a bad sign and points to an environment issue.

I don't have a direct answer, but here are a few random thoughts:

Your GCC 12.3 is too new for CUDA 12.1. table
Your LAMMPS version is quite old.
I have no experience with EasyBuild. Could you try to just do the usual building sequence interactively?
How do you get your PyTorch? Does it have CXX11 ABI? It looks like your version is 2.1, which is scary. 1.11 works, 1.12&13 don't (at least on NVIDIA), and the early 2.x versions also don't (but not sure exactly which ones). I would recommend using something recent (2.4/2.5).
Setting the CPU arch for Kokkos is unnecessary (won't affect your performance unless you're using a lot of CPU functionality) and complicates things with extra flags.
You're enabling a lot of LAMMPS packages, maybe these affect compiler flags? Try a base version of LAMMPS w/Allegro and no extra packages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilation error with type mismatch, when building with PyTorch and Kokkos #55

Compilation error with type mismatch, when building with PyTorch and Kokkos #55

moravveji commented Oct 25, 2024

anjohan commented Nov 4, 2024

Compilation error with type mismatch, when building with PyTorch and Kokkos #55

Compilation error with type mismatch, when building with PyTorch and Kokkos #55

Comments

moravveji commented Oct 25, 2024

anjohan commented Nov 4, 2024