[SPARSE] Add support for cuSPARSE backend #527

Rbiessy · 2024-07-04T15:22:14Z

Description

Add support for the cuSPARSE backend.

Fixes #25

Depends on #500. All of the changes specific to cuSPARSE are in the last commit.

Rendered documentation: docs.zip

Checklist

All Submissions

Do all unit tests pass locally?
- cusparse_log.txt
- Tested on A100 with oneAPI 2024.2 compiler + Codeplay Nvidia plugin + CUDA 12.5.0
Have you formatted the code using clang-format?

docs/domains/sparse_linear_algebra.rst

src/sparse_blas/backends/cusparse/cusparse_task.hpp

Rbiessy · 2024-08-29T13:44:36Z

I have pushed the changes to merge with the develop branch, apply relevant feedback from the previous sparse PR and worked on adding support for the enqueue_native_command extension.
I have tested this branch with 2024.2 + CUDA 12.5 (not using the extension): log_oneapi_2024_2_cuda_12_5.txt
and a 2025.0 RC + CUDA 12.2: log_oneapi_2025_0_cuda_12_2.txt

src/sparse_blas/backends/cusparse/CMakeLists.txt

examples/sparse_blas/compile_time_dispatching/CMakeLists.txt

examples/sparse_blas/compile_time_dispatching/sparse_blas_spmv_usm_mklcpu_cusparse.cpp

src/sparse_blas/backends/cusparse/cusparse_error.hpp

src/sparse_blas/backends/cusparse/cusparse_global_handle.hpp

src/sparse_blas/backends/cusparse/cusparse_handles.cpp

tests/unit_tests/sparse_blas/source/sparse_spsv_usm.cpp

src/sparse_blas/common_op_verification.hpp

src/sparse_blas/backends/mkl_common/mkl_handles.cxx

src/sparse_blas/macros.hpp

src/sparse_blas/backends/cusparse/cusparse_scope_handle.hpp

Rbiessy · 2024-10-07T12:14:49Z

FYI we're holding on this PR for a little bit. We have found some tests failing when running on A100, in particular if the machine is busy or multiple tests are run in parallel. There seem to be race conditions which we don't understand yet.

spencerpatty · 2024-10-14T16:57:36Z

Do we need to test with more modern version of CUDA libs, for instance, oneMath Interfaces project currently tests with 11.8 and 12.0 SDKs, but do we need latest SDK (12.6) to also be tested? Are there features of 12.6 that we exploit that are different from say 12.0 ?

Rbiessy · 2024-10-15T08:48:58Z

Do we need to test with more modern version of CUDA libs, for instance, oneMath Interfaces project currently tests with 11.8 and 12.0 SDKs, but do we need latest SDK (12.6) to also be tested? Are there features of 12.6 that we exploit that are different from say 12.0 ?

The oneAPI Core Team at Codeplay is testing oneMKL Interface with our CUDA plugins and CUDA 12.4. I am personally using 12.5 for the test logs attached here.

Note that currently the minimum CUDA version set to use the cuSPARSE backend is CUDA 12.2, see the related discussion here: https://github.com/oneapi-src/oneMKL/pull/527/files#r1750927886. The internal CI would need to be bumped to that version at least to test cuSPARSE!
There is an important feature in CUDA 12.4 to enable cusparseSpMV_preprocess. Other than this using more recent CUDA versions mostly fixes some issues for the features that we support.

Rbiessy · 2024-10-25T13:45:55Z

@oneapi-src/onemkl-sparse-write I have updated the PR with fixes needed and conflicts fixed. To summarize:

The main issue we have observed is that cuSPARSE seems to require that the same CUStream is used across all the steps of an operation. This worked well with @gajanan-choudhary's suggestion to cache CUStream and cusparseHandle so this is now done in c5ba2c4, 1f5c80c and most importantly 58f08c9
- This had more consequences with the extension used in 2025.0 RC which are fixed in 956ae49 and 43428ca
We also found some dependencies issues when running multiple tests in parallel which are fixed in c0eae1e, 2149e39 and bad6bfb. The last commit was also needed for rocSPARSE. It seems the backends need the data to be available on the device before the optimize step is run. I've not seen this well documented in the backend documentations. ~~I'll look into clarifying this in our specification.~~ We clarified this in our spec [oneMKL][spblas] Restrict features not supported by any backends uxlfoundation/oneAPI-spec#542 (comment).
We hit a cuSPARSE bug which is now workaround in 6318d53 (more details in the link added in this commit)
I found some issues with the CT example, fixed in 2888232 and the README output was updated in b4f553c
97eef5c is a small improvement to clarify which symbols are part of the public library and which are implementation details.

New test logs:

@gajanan-choudhary let me know if you have any concerns with these changes.

gajanan-choudhary

Latest changes look fine to me. Feel free to merge in. Thanks for the great work, I'm sure this PR took monumental effort and patience on your end!

Rbiessy · 2024-10-29T09:48:58Z

Thanks all for the reviews!

Rbiessy requested review from spencerpatty and gajanan-choudhary July 4, 2024 15:24

Rbiessy commented Jul 5, 2024

View reviewed changes

docs/domains/sparse_linear_algebra.rst Outdated Show resolved Hide resolved

Rbiessy force-pushed the romain/cusparse branch from e0fa63f to 52becf2 Compare July 18, 2024 18:10

Rbiessy mentioned this pull request Jul 24, 2024

[SPARSE] Add support for rocSPARSE backend #544

Open

2 tasks

pen-and-papers reviewed Aug 26, 2024

View reviewed changes

src/sparse_blas/backends/cusparse/cusparse_task.hpp Show resolved Hide resolved

[SPARSE] Add support for cuSPARSE backend

fac276d

Rbiessy force-pushed the romain/cusparse branch from 1ca0cc4 to fac276d Compare September 6, 2024 15:59

gajanan-choudhary reviewed Sep 9, 2024

View reviewed changes

src/sparse_blas/backends/cusparse/CMakeLists.txt Show resolved Hide resolved

gajanan-choudhary reviewed Sep 9, 2024

View reviewed changes

examples/sparse_blas/compile_time_dispatching/CMakeLists.txt Show resolved Hide resolved

gajanan-choudhary reviewed Sep 9, 2024

View reviewed changes

examples/sparse_blas/compile_time_dispatching/sparse_blas_spmv_usm_mklcpu_cusparse.cpp Outdated Show resolved Hide resolved

gajanan-choudhary assigned Rbiessy Sep 10, 2024

gajanan-choudhary added feature A request to add a new feature backend A request to enable new implementation behind API labels Sep 10, 2024