Starting point for GPU accelerated python libraries
Adapted from original work from https://github.com/PWhiddy/pybind11-cuda
Present work uses modern CMake/Cuda approach
Cuda
Python 3.6 or greater
Cmake >= 3.12 (for CUDA support and the new FindPython3 module)
If you use cmake version >= 3.18, you can use variable CMAKE_CUDA_ARCHITECTURES instead of CUDAFLAGS:
mkdir build; cd build
# provide a default cuda hardware architecture to build for
cmake -DCMAKE_CUDA_ARCHITECTURES="75" -DPython3_EXECUTABLE=`which python` ..
make
Please note that specifiying Python3_EXECUTABLE
is not required, but recommended if you have multiple python executable on your system (e.g. one from OS, another from conda, etc...); this way you can control which python installation will be used.
If you have an older version cmake, you can pass nvcc flags to cmake using env variable CUDAFLAGS
# change CUDAFLAGS according to your target GPU architecture
mkdir build; cd build
# provide a default cuda hardware architecture to build for
export CUDAFLAGS="-arch=sm_75"
cmake -DPython3_EXECUTABLE=`which python` ..
make
Test it with
cd src
python3 test_mul.py
gpu_library.so and test_mul.py must be in the same folder. Alternatively you can path to gpu_library.so to your PYTHONPATH env variable.
- Compiles out of the box with cmake
- Numpy integration
- C++ Templating for composable kernels with generic data types
Originally based on https://github.com/torstem/demo-cuda-pybind11