Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos and Cabana installation issues on GPU #660

Closed
dineshadepu opened this issue Aug 8, 2023 · 4 comments
Closed

Kokkos and Cabana installation issues on GPU #660

dineshadepu opened this issue Aug 8, 2023 · 4 comments
Labels
question Further information is requested

Comments

@dineshadepu
Copy link
Contributor

Hello,

I'm currently in the process of installing cabana and kokkos on a GPU, but I've encountered some difficulties. I've used the following command for cmake:

    cmake \
      -D CMAKE_BUILD_TYPE="Release" \
      -D CMAKE_CXX_COMPILER=$KOKKOS_GPU_SRC_DIR/bin/nvcc_wrapper \
      -D CMAKE_INSTALL_PREFIX=$KOKKOS_GPU_INSTALL_DIR \
      -D Kokkos_ENABLE_SERIAL=ON \
      -D Kokkos_ENABLE_OPENMP=ON \
      -D Kokkos_ENABLE_CUDA=ON \
      -D Kokkos_ENABLE_CUDA_LAMBDA=ON \
      -D Kokkos_ARCH_AMPERE86=ON \
      \
      .. ;

Here is the output for above command

----UoS WORKSTATION ----------------------------------
 dinesh@UoSWorkstation (master) /home/dinesh/post_doc/softwares/kokkos_cuda_try/build $  
|  Workstation=>     cmake \
      -D CMAKE_BUILD_TYPE="Release" \
      -D CMAKE_CXX_COMPILER=$KOKKOS_GPU_SRC_DIR/bin/nvcc_wrapper \
      -D CMAKE_INSTALL_PREFIX=$KOKKOS_GPU_INSTALL_DIR \
      -D Kokkos_ENABLE_SERIAL=ON \
      -D Kokkos_ENABLE_OPENMP=ON \
      -D Kokkos_ENABLE_CUDA=ON \
      -D Kokkos_ENABLE_CUDA_LAMBDA=ON \
      -D Kokkos_ARCH_AMPERE86=ON \
      \
      .. ;
-- Setting default Kokkos CXX standard to 17
-- The project name is: Kokkos
-- Using internal gtest for testing
-- Compiler Version: 11.5.119
-- kokkos_launch_compiler (/home/dinesh/post_doc/softwares/kokkos_cuda_try/bin/kokkos_launch_compiler) is enabled...
-- Using -std=c++17 for C++17 standard as feature
-- Built-in Execution Spaces:
--     Device Parallel: Kokkos::Cuda
--     Host Parallel: Kokkos::OpenMP
--       Host Serial: SERIAL
-- 
-- Architectures:
--  AMPERE86
-- Found CUDAToolkit: /usr/include (found version "11.5.119") 
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found TPLCUDA: TRUE  
-- Using internal desul_atomics copy
-- Kokkos Devices: OPENMP;CUDA;SERIAL, Kokkos Backends: OPENMP;CUDA;SERIAL
-- Configuring done
You have changed variables that require your cache to be deleted.
Configure will be re-run and you may have to reset some variables.
The following variables have changed:
CMAKE_CXX_COMPILER= /home/dinesh/post_doc/softwares/kokkos_cuda_try/bin/nvcc_wrapper

-- Setting default Kokkos CXX standard to 17
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/dinesh/post_doc/softwares/kokkos_cuda_try/bin/nvcc_wrapper - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'RelWithDebInfo' as none was specified.
-- The project name is: Kokkos
-- Using internal gtest for testing
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=gnu++17 for C++17 extensions as feature
-- Built-in Execution Spaces:
--     Device Parallel: NoTypeDefined
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
-- 
-- Architectures:
-- Found TPLLIBDL: /usr/include  
-- Using internal desul_atomics copy
-- Kokkos Devices: SERIAL, Kokkos Backends: SERIAL
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dinesh/post_doc/softwares/kokkos_cuda_try/build

However, when I proceeded with the 'make' command, I encountered the following error:

|  Workstation=> make
[  0%] Built target AlwaysCheckGit
[  3%] Building CXX object CMakeFiles/impl_git_version.dir/generated/Kokkos_Version_Info.cpp.o
[  7%] Linking CXX static library libimpl_git_version.a
[  7%] Built target impl_git_version
[ 11%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o
[ 14%] Building CXX object core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Command_Line_Parsing.cpp.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
make[2]: *** [core/src/CMakeFiles/kokkoscore.dir/build.make:90: core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Command_Line_Parsing.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1041: core/src/CMakeFiles/kokkoscore.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Here is my configuration:

My deviceQuery results in following output:


Device 1: "NVIDIA RTX A5500"
  CUDA Driver Version / Runtime Version          12.0 / 12.2
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 24248 MBytes (25425608704 bytes)
  (080) Multiprocessors, (128) CUDA Cores/MP:    10240 CUDA Cores
  GPU Max Clock rate:                            1665 MHz (1.66 GHz)

so I used AMPERE86 as arch option.

My gcc version is as follows

  Workstation=> gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

My nvcc is as follows

  Workstation=> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Any idea where I am going wrong?

@streeve
Copy link
Member

streeve commented Aug 8, 2023

It looks like this wasn't a clean build - try removing all build folders entirely, reconfiguring, and then rebuilding. And you shouldn't need to specify the nvcc_wrapper at all - it will be added automatically

In general, questions like this are much easier to discuss on slack (Kokkos or Cabana)

@streeve streeve added the question Further information is requested label Aug 8, 2023
@dineshadepu
Copy link
Contributor Author

After installing the latest cuda version which is cuda-12, I am able to run GPU cases. Thank you. Closing it.

@dineshadepu
Copy link
Contributor Author

Hi @streeve, one last question regarding the installation. I was successful at installing Cabana for gpu and it is blazingly fast!! Thank you for developing this.

I have searched through all the documentation and dependent packages of Cabana as well, but couldn't find instruction on running the code on multiple-GPU machines. As I have two GPU cards, I am wondering if Cabana can compile and run code on multi-GPU machines (As I know it does, but I couldn't find any documentation on running). Can you please clarify this, also are any there any instruction to run code based on Cabana out there?

@streeve
Copy link
Member

streeve commented Aug 11, 2023

You will need to run using MPI and use particle communication features in Cabana

We assume that GPU-aware MPI is available for GPU-direct communication. If that's not the case then you would need to copy to the host prior to MPI communication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants