hiptensor is AMD's C++ library for accelerating tensor primitives based on the composable kernel library, through general purpose kernel languages, like HIP C++.
- AMD CDNA class GPU featuring matrix core support: gfx908, gfx90a as 'gfx9'
Note: Double precision FP64 datatype support requires gfx90a
- ROCm stack minimum version 5.7
- ROCm-cmake minimum version 0.8.0 for ROCm 5.7
- C++ 17
- CMake >=3.6
- Composable Kernel
Optional:
- doxygen (for building documentation)
Run the steps below to build documentation locally.
cd docs
pip3 install -r .sphinx/requirements.txt
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
Operations - Contraction Tensor Data Types - FP32 , FP64
- Create and track a hipTensor fork.
- Clone your fork:
git clone -b develop https://github.com/<your_fork>/hipTensor.git .
.githooks/install
git checkout -b <new_branch>
...
git add <new_work>
git commit -m "What was changed"
git push origin <new_branch>
...
- Create a pull request to ROCmSoftwarePlatform/hipTensor develop branch.
- Await CI and approval feedback.
- Once approved, merge!
Note: Please don't forget to install the githooks as there are triggers for clang formatting in commits.
Option | Description | Default Value |
---|---|---|
AMDGPU_TARGETS | Build code for specific GPU target(s) | gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+ |
HIPTENSOR_BUILD_TESTS | Build Tests | ON |
HIPTENSOR_BUILD_SAMPLES | Build Samples | ON |
By default, the project is configured as Release mode. Here are some of the examples for the configuration:
Configuration | Command |
---|---|
Basic | CC=hipcc CXX=hipcc cmake -B<build_dir> . |
Targeting gfx908 | CC=hipcc CXX=hipcc cmake -B<build_dir> . -DAMDGPU_TARGETS=gfx908:xnack- |
Debug build | CC=hipcc CXX=hipcc cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug |
Build without tests (default on) | CC=hipcc CXX=hipcc cmake -B<build_dir> . -DHIPTENSOR_BUILD_TESTS=OFF |
After configuration, build with cmake --build <build_dir> -- -j<nproc>
- Target a specific GPU (e.g. gfx908:xnack-)
- Use lots of threads (e.g. -j64)
Tests API implementation of logger verbosity and functionality. o <build_dir>/bin/logger_test
Tests the API implementation of bilinear contraction algorithm with validation. o <build_dir>/bin/bilinear_contraction_f32_test o <build_dir>/bin/bilinear_contraction_f64_test
Tests the API implementation of scale contraction algorithm with validation. o <build_dir>/bin/scale_contraction_f32_test o <build_dir>/bin/scale_contraction_f64_test
These are stand-alone use-cases of the hipTensor contraction operations.
Demonstrates the API implementation of bilinear contraction operation without validation. o <build_dir>/bin/simple_contraction_bilinear_f32
Demonstrates the API implementation of scale contraction operation without validation. o <build_dir>/bin/simple_contraction_scale_f32
Client application links to hipTensor library, and therefore hipTensor library needs to be installed before building client applications.
mkdir -p samples/build
cd samples/build
cmake \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_PREFIX_PATH="/opt/rocm;${PATH_TO_HIPTENSOR_INSTALL_DIRECTORY};${PATH_TO_CK_INSTALL_DIRECTORY} \
..
make