Skip to content

Commit

Permalink
Master release 4.3.00 (#2163)
Browse files Browse the repository at this point in the history
* HIP: since Kokkos has moved it out of experimental we should clean up

Just reflecting the move of HIP and HIPSpace out of experimental
so that we do not get deprecation warning and even failures down
the road. This was really done in Kokkos Core 4.0.0 so it is time
to catch up...

* Applying clang-format

* Sparse: fix cusparse spgemm hang properly

The issue is fixed by disabling the TPL in spec_avail
when a problematic version of CUDA/cuSPARSE is being used.

* Sparse: fix logic for bad cursparse spgemm version.

Just inverted the logic statement to enable the TPL when it is
known to work correctly.

* Improvements on the unification attempt logic for axpby(), including new tests

* Addressing feedbacks from Luc, plus some small changes here and there:

In KokkosBlas1_axpby_unification_attempt.hpp:
- Removed unnecessary variables, routines, and checks
- Imposed terminology consistency: variable names begin with lower case letters, type names begin with upper case letters
- Using static_assert as much as possible
- Using 'public' and 'private' keywords accordingly
- Improved some explanations and error messages

In KokkosBlas1_axpby_spec.hpp:
- Replace 'a' and 'b' by 'scalar_x' and 'scalar_y' where appropriate, to keep consistency with the terminology used in the 'impl' and 'mv_impl' files of the axpby operation.
- Not using the 'KOKKOSBLAS_OPTIMIZATION_LEVEL_AXPBY' define anymore. Code is now consistent with the 'old' value 3 for such define.

In KokkosBlas1_axpby_impl.hpp and KokkosBlas1_axpby_mv_impl.hpp:
- Not using the 'KOKKOSBLAS_OPTIMIZATION_LEVEL_AXPBY' define anymore. Code is now consistent with the 'old' value 3 for such define.
- Using 'if constexpr' whenever possible
- Checking that -1 <= scalar_x <= 2 and that -1 <= scalar_y <= 2
- Replaced '} else {' by '} else if (scalar_x == 2)' or by '} else if (scalar_y == 2)', whenever possible
- Improved error messages
- Improved explanation headers a bit

In KokkosBlas1_axpby.hpp:
- Renamed some variables to more meaningful names

* Formatting

* Using 'ifdef HAVE_KOKKOSKERNELS_DEBUG', per Luc's suggestion

* Addressing feedbacks from Luc

* Correcting compilation errors in my Mac

* Backup

* SYR2: fix unit-test type issue

On KokkosEco_Trilinos_Weaver_CUDA112_opt-uvm the SYR2 test
enerates a compile time error probably due to a mixed use of host
and device views when comparing implemented vs. reference results.

* CUDA 11.0.1 / cuSPARSE 11.0.0 changed SpMM enums

* SYR2: applying clang-format

* CUDA 11.2.1 / cuSPARSE 11.4.0 changed SpMV

* KokkosBlas1_axpby: include <iostream> for debug builds

Resolve compilation errors in debug mode:
"error: no member named 'cout' in namespace 'std';"

* Backup

* Backup

* Backup

* Backup

* Backup

* Backup

* Backup

* Backup

* Backup

* Backup

* Backup

* Backup

* Address CI build errors

* Some cleanup on current pull request, making it more related to 'just' the creation of the lapack subdirectory and the moving of some files to there

* More cleanup

* Re-enabling gesv unit tests under the lapack subdirectory

* Adding BLAS routines back, for backwards compatibility

* Formatting

* Small cleaning

* Correcting error in Jenkins

* Fixing compilation error on Jenkins when dealing with HIP

* Add required rtd conf file

* README.md: Use correct project slug

* docs/requirements.txt: Add sphinx-rtd-theme

* Addressing latest feedbacks from Luc.

* Formatting

* KokkosKernelsConfig.cmake: add all_libs target and necessary aliases

* Intent of these changes is to allow for building Trilinos with
  KokkosKernels as an external TPL

* hide native merge-path SpMV behind "native-merge"

* test native-merge algorithm

* Quick fix for night compilation with Trilinos

* SPTRSV: check if cusparse is available before calling TPL path

Since SpTRSV does not implement the TPL layer the usual way we need
to be extra careful before calling the TPL implementation path. If
cusparse is not available then we definitely want to revert back to
calling the native implementation. Similarly, if the execution space
is not Kokkos::CUDA, let's use the native implementation.

* SpTRSV: more strickly check prerequisites in SptrsvHandle

Check that CUSPARSE is enabled and that HandleExecSpace is
Kokkos::CUDA before allowing users to set the implementation to use
the CUSPARSE TPL.

* SpTRSV: fix some type definition and variable usaged for cuSPARSE

Since we are guarding the cusparse path a bit better we need to be
careful when some types are defined and to mark some variables
(void) when they do not get used by an implementation...

* SpTRSV: applying clang-format

* SpTRSV: more fixes

* SpTRSV: apply clang-format

* SYCL: fix for Trilinos build with MKL

* Apply clang-format to non-cmake files

* SYR2: fix issue with bad type in test function

After comparing various function signatures and view types, the change
allows tests to pass correctly and seem correct based on input params.

* Update Test_Blas2_syr2.hpp

Fix mistake in host/device view argument

* LAPACK: adding rocsolver TPL

Adding the necessary CMake logic and TPL layer to support rocsolver
for LAPACK. Enabling the TPL in gesv and updating gesv test to run
by default the more common configurations and only run specific ones
when the associated TPL (MAGMA) is enabled.

* Lapack: change according to Brian's review

The SpaceAccessibility of IPIVV needs to be modified for MAGMA.
The value_type of IPIVV needs to be rocblas_int when running with
rocSOLVER.

The types used for gesv_tpl_spec_avail and the actual TPL
instantiation where mismatched leading to linker error.

* cmake/Dependencies.cmake: remove ROCSOLVER

Removing ROCSOLVER to prevent configuration errors with Trilinos
Will bring back when support is added in Trilinos for ROCSOLVER as TPL

* Lapack: cusolver TPL logic and support for gesv

Adding CMake logic to support cusolver and implementing gesv using
cusolver getrf and getrs. Unit-test is passing without problems!

* Lapack: updating logic in cm_generate_makefile for cusolver

There is some specific TPL logic in cm_generate_makefile and it
cannot be found for cusolver, changing that might to the trick!

* Backup

* Backup

* Backup

* Formatting

* mv_unification tests with double are failing by very small amounts, e.g. 5.9e-14 vs. 3.6e-14

* Trying one more increment on tolerance

* Putting pragma's and unrolls properly right before for loops (compilation warning at weaver)

* Giving it another try to larger tolarance, after fixing the warning on pragma and unroll

* Lapack: gesv, implementing review commments

* Adding Changelog for Release 4.2.0 (#2031)

* Adding Changelog for Release 4.2.0

Part of Kokkos C++ Performance Portability Programming EcoSystem 4.2

* Formatting the changelog a bit more

Mentioning more clearly LAPACK vs BLAS, grouping PRs by logical work unit, etc...

* Remove minor revisions, improve text descriptions

* Changelog: add spmv perftest detail

---------

Co-authored-by: Luc Berger <[email protected]>
Co-authored-by: Carl Pearson <[email protected]>
Co-authored-by: brian-kelley <[email protected]>

* NRM1: refactoring TPL layer a bit with c++17 if constexpr

Hopefully this leads to simpler code, less duplication, less
macro and easier maintenance!
Adding support for oneapi MKL while making tpl layer changes.

* BLAS: Nrm1 implementing Brian's feedback

* Blas: nrm1, fix in tpl spec decl

* BLAS: nrm1 problems with ExecSpace template and lack of Kokkos::Threads

Fix issue with Kokkos::Threads and Kokkos::HIP

* Another attempt while waiting to get access to the solo cluster

* Formatting

* Correction error from the last commit

* Fixing the error that was happening only at the solo cluster

* Increase tolerance a bit more

* ncreasing tolerances in all 4 locations

* Backup

* Backup

* Formatting

* Forgot to add ClusteringAlgorithm:: at some spots

* Formatting

* Lapack: fixing issue with Magma TPL in gesv, trtri, etc...

Adding proper support for MAGMA after having it moved to the Lapack
directory and checking it does not create issues with cuSOLVER.

* Update blas/unit_test/Test_Blas1_swap.hpp

Co-authored-by: brian-kelley <[email protected]>

* cmake: Add workaround check for CUSOLVER support with Trilinos

TPL_ENABLE_CUDA default enables CUBLAS and CUSOLVER in Trilinos, but not CUSPARSE
This PR modifies the TPL requirement checks to maintain compatibility with existing configration options of Trilinos

Attempt to resolve/workaround issue #2047

* Addressing Brian Kelley's feedbacks

* Formatting

* Removing 'ClusteringAlgorithm::'

* Lapack: gesv, incorporate Brian's feedback

* Applying clang-format

* Fixing some deprecation warnings/errors for ROCm 6

* BLAS: fix bug in TPL layer of KokkosBlas::swap

The cuBLAS Kokkos::complex<float> specialization had a small bug
where the rank of the view was not specified correctly!

* CMake: fix bugs in deciding KOKKOSKERNELS_TPL_BLAS_RETURN_COMPLEX

* TPL: revise BLAS1 dot implementation

* Fix compile errors for C-linkage dot functions returning std::complex

* Use a C struct for complex numbers

to avoid error: '_Complex' is a C99 extension [-Werror,-Wc99-extensions].

* Add a workaround by disabling host MKL dot with complex numbers

* Allow KokkosKernels_ENABLE_PERFTESTS=ON to build perf_tests without KokkosKernels_ENABLE_TESTS=ON

* format sparse/tpls/KokkosSparse_spmv_tpl_spec_decl.hpp

* cmake: fix tpl check so cusolver can be disabled when needed

* Link std::filesystem for IntelLLVM in perf_test/sparse

* gemm3 perf test: user CUDA, SYCL, or HIP device for kokkos:initialize

* Fix for rocm_verison header inclusion

* fence Kokkos before timed interations

* Deprecate KOKKOSLINALG_OPT_LEVEL

* Add CMake warning message if KokkosKernels_LINALG_OPT_LEVEL is used

* Async matrix release for MKL >= 2023.2

* Support CUBLAS_{LIBRARIES,LIBRARY_DIRS,INCLUDE_DIRS,ROOT} and KokkosKernels_CUBLAS_ROOT

* KokkosSparse_spmv_impl_merge.hpp: use capture by reference

Resolve warnings in builds with c++20 support enabled:
"kokkos-kernels/sparse/impl/KokkosSparse_spmv_impl_merge.hpp:166:81: warning: implicit capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]"

* KokkosSparse_par_ilut_numeric_impl.hpp: use capture by reference

Resolve warnings in builds with c++20 support enabled:
"kokkos-kernels/sparse/impl/KokkosSparse_par_ilut_numeric_impl.hpp(591):
warning #2908-D: the implicit by-copy capture of "this" is deprecated"

* Backup

* Backup

* Backup

* Backup

* Formatting

* Correcting compilation error

* Typo

* Changes for syr and syr2, to be tested at weaver

* Formatting

* Changes for axpby

* Backup

* Formatting

* Just to force new checking tests in github

* Addressing feedback from Luc.

* Don't call optimize_gemv for one-shot spmv

* Add HIPManagedSpace support

- CMake option for ETI
- Run unit tests with a Kokkos::Device, not just Kokkos::HIP
	- Like we do for Cuda
	- Still use HIPSpace unless Managed is the only enabled memspace
- Couple of minor fixes
	- Allow querying free HIPManagedSpace memory for SpGEMM
	- Disable VBD coloring (not a huge deal, had to do same on CUDA)
	- Use correct memory space in SpTRSV solve

* Backup

* Backup

* Backup

* Minor typo

* Add block support to all SPILUK algorithms (#2064)

* Interface for block iluk

* Progress. Test hooked up

* Progress on test refactoring

* More test reorg

* Fix test

* Refactor spiluk numeric a bit with a struct wrapper

* Add good logging

* progress

* Fix block test

* Progress but potential dead end

* Giving up on this approach for now

* progress

* Make verbose

* Progress

* Progress

* RP working?

* Progress on TP alg

* Bug fix

* Progress on template stuff

* Progress on block TP

* Progress

* Get rid of all the static_casts

* More cleanup. Steams now support blocks

* Tests not passing

* Serail tests all working, both algs, blocked

* Remove output coming from spiluk test

* Final fixes for CPU

* Cuda req full template specification for SerialGemm::invoke

* Don't use scratch for now

* Formatting

* Fix warnings

* Formatting

* Add tolerance to view checks. Use macro and remove redundant test util

* Fix for HIP

* formatting

* Another test reorg to fix weirdness on solo

* formatting

* Remove unused var

* Github feedback

* Remove test cout

* formatting

* Zero-size arrays can cause problems

* Fix unused var warning

* Add CUDA/HIP TPL support for KokkosSparse::spadd (#1962)

* spadd: change arguments to ctor of SPADDHandle

add a default value to input_sorted;
add a second argument input_merged to indicate unqiue entries;
So that we can easily know whether we can use TPLs on the input matrices

* spadd: add cuda/rocm TPL support for spadd_symbolic/numeric

* Make spiluk_handle::reset backwards compatible (#2087)

* Make spiluk_handle::reset backwards compatible

By making block_size default to -1, which means don't change
block size.

* Switch default val for block_size for reset_handle

* formatting

* Fix comment

* spadd: add APIs without an execution space argument (#2090)

* Lapack - SVD: adding initial files that do not implement anything (#2092)

Adding SVD feature to Lapack component, the interface is similar
to classic Lapack and the implementation relies on the TPL layer
to provide initial capabilities. The TPL supported are LAPACK,
MKL, cuSOLVER and rocSOLVER.

Testing three analytical cases 2x2, 2x3 and 3x2 and then some
randomly generated matrices.

* Hands off namespace `Kokkos::Impl` - cleanup couple violations that snuck in (#2094)

* Do not use things from namespace Kokkos::Impl (Kokkos::{Impl:: -> }ALL_t)

* Do not use things from namespace Kokkos::Impl (Kokkos::Impl::DeepCopy)

Can achieve the same with Kokkos::deep_copy

* Fix warning `declaration of ‘std::size_t n’ shadows a parameter`

* Change name of yaml-cpp to yamlcpp

* Fix macro setting in CMakeLists

* GMRES: Add support for BSR matrices

Also, add a test for this.

* Remove all mentions of HBWSpace

* Reintroduce EXECSPACE_(SERIAL,OPENMP,THREADS}_VALID_MEM_SPACES

Drop HBWSPACE as an option

* Lapack: adding svd benchmark

Fixing unit-test for CUSOLVER and adding benchmark to check the
algorithm performance on various platforms.

* Fix Cuda TPL finding (#2098)

- Allow finding cusparse, cusolver based on manually provided paths
  - This is necessary when using an nvhpc toolchain instead of a
    standard cuda toolchain
- Set header paths correctly (this is redundant in a cuda installation,
  in which $CUDA_ROOT/include is already a system include dir, but
  needed in other cases)

* Add support for BSR matrices to some trsv routines (#2104)

* Add support for BSR matrices to some trsv routines
* Change trsv to gesv

* Lapack - SVD: adding quick return when cuSOLVER is skipped (#2107)

Currently we still run the tests on U, S and Vt which does not
make sense since we actively skip this test because cuSOLVER does
not support more columns than rows...

* Fix build error in trsv on gcc8

* Add a workaround for compilation errors with cuda-12.2.0 + gcc-12.3 (#2108)

On Perlmutter@NERSC, I met this error

/usr/lib64/gcc/x86_64-suse-linux/12/include/avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
   typedef __half __v8hf __attribute__ ((__vector_size__ (16)));

The workaround was mentioned at https://forums.developer.nvidia.com/t/including-cub-header-breakes-compilation-with-gcc-12-and-sse2-or-better/255018

* Lapack - SVD: fix for unit-test when MKL is enabled (#2110)

This is really a problem with our implementation of the BLAS
interface when MKL is enabled since MKL redefines the function
signatures of blas functions using MKL_INT instead if int...

* Revert "Merge pull request #2037 from ndellingwood/remove-rocsolver-optional-dependency" (#2106)

This reverts commit 5a36d57, reversing
changes made to 2c66d29.

* Fixing missing inclusion in source file

* BLAS - MKL: fixing HostBlas calls to handle MKL_INT type (#2112)

MKL redefines the BLAS interface based on how MKL_INT is defined
we need to wrap that definition with our own Kokkos Kernels INT
type to make both compatible with regular BLAS.

applying clang-format

* Fix weird Trilinos compiler error

It seemed to have a problem with these deep_copies, so just do
the copy by hand like it was being done before my recent trsv
PR.

* Update changelog

* Update changelog

* Block spiluk follow up (#2085)

* Fix for gemm
* Remove unused divide method
* Enhancements to spiluk test
* Progress. Block spiluk now checks out against analytical results
* LUPrec test with spiluk woring
* Disable spiluk LU test on non-host
* Enhancements to spiluk test
* Clean up a few issues uncovered by gh review

* github workflows: update to v4 (use Node 20)

* Refactor Test_Sparse_sptrsv (#2102)

* Refactor Test_Sparse_sptrsv

* More cleanups

* Remove old commented-out code

* CMake: error out in certain case (#2115)

Graph unit tests are unique in that they use default_scalar for the
KokkosKernelsHandle. So if test-eti-only is ON, but neither float nor
double is instatiated, then error out for the graph unit tests.

Users can still build without float or double if they want, but only if
they turn off tests or the graph component.

* Wiki examples for BLAS2 functions are added (#2122)

Some small additional change the the function headers themselves
to add some missing header file inclusions.

Applying clang-format

Removing constexpr since it won't happen before some work in Core.

* Increase tolerance on gesv test (Fix #2123) (#2124)

And uncomment the verbose output for when tolerance is exceeded,
since that helps debug this sort of issue.
This is only printed at most once so it won't spam the output if
the entire vector is wrong.

* Spmv handle (#2126)

* spmv handle, TPL reuse

* using handle in unification layer and hooking up new algorithm
enums with old Controls options

* Update spmv_merge perf test
Compare KK merge vs. default and KK native

* Small changes to help text of spmv_merge perf test

* Complete backwards compatibility with Controls interface
- copy over spmv algorithm selection correctly
- copy expert tuning parameters

* Controls spmv: accept other name for bsr algo

* bsr spmv test: disable tensor core
It was not actually being run before due to a different name
actually enabling it (experimental_bsr_tc rather than experimental_tc)

* Disable OneMKL spmv for complex types
oneapi 2023.2 throws error saying complex isn't supported

* OneMKL: call optimize_gemv during setup

* Option to apply RCM reordering to extracted CRS diagonal blocks (#2125)

* Add rcm option when extracting diagonal blocks

* Update kk_extract_diagonal_blocks_crsmatrix_sequential

* Add test for extracting diagonal blocks with rcm

* Update RCM checking

* cm_test_all_sandia: various updates

- updates for blake

* cm_test_all_sandia: drop decommissioned/unavailable machines

- remove voltrino, mayer

* Fix2130 (#2132)

* Fix #2130

- Do not call BsrMatrix spmv impl if block size is 1
- Instead, convert it to unmanaged CrsMatrix and call spmv again
  - cuSPARSE returned an error code in this case
  - Better performance

* Formatting

* Remove redundant remove_pointer_t

Handle is already a non-pointer type

* Benchmark: modifying spmv benchmark to run range of spmv tests (#2135)

This could be further automated to run on matrix from suite sparse

* Kokkos Kernels: update version guards to drop old version of Kokkos (#2133)

Since we are now in the 4.2 series we only support up to 4.1.00.
Older version of Kokkos Core will require older version of Kokkos
Kernels for compatibility. Once 4.3.00 is out we will move to
drop support for the 4.1 series and only keep 4.2 and 4.3 series.

* ODE: BDF methods (#1930)

* ODE: adding BDF algorithms

Implementing BDF formula for stiff ODEs.
Orders 1 to 5 are available and tested.
The integrators can be called on GPU to
solve multiple systems in parallel.

* ODE: fixing storage handling for start-up RK stack

* ODE: clang-format

* ODE: first adaptive version of BDF

The current implementation only allows for adaptivity in time,
at this point the BDF Step actually converges as expected with
first order integration!

* ODE: fixing issues with adaptive BDF

The unit-test BDF_adaptive now shows the integration
of the logistic equation using adaptive time steps and
increasing integration order from 1 to 5.

* ODE: running BDF on StiffChemistry problem

The problem runs fine and is solved but there are oscillations
while the behavior of the solution is smooth. More investigation
is needed...

* BDF: fixing types and template parameters in batched calls

Bascially we need template parameters to be more versatile
and cannot assume that all rank1 views will have the exact
same underlying type, for instance layouts can be different.

* More fixes for GPUs only in tests this time.

* ODE: BDF adaptive, fix small bug

After adding rhs and update vectors to temp the subviews taken for
other variables need to be offset appropriately...

* Revert "More fixes for GPUs only in tests this time."

This reverts commit 2f70432.

* Revert "Revert "More fixes for GPUs only in tests this time.""

This reverts commit 836012b.

* ODE: BDF small change to temporarily avoid compile time issue

True fix involving a KOKKOS_VERSION check is upcoming after more
tests on GPU side...

* ODE: BDF fix for some printf statements that will go away soon...

* ODE: adding benchmark for BDF

The benchmark helps us monitor the performance of the BDF
implementaiton across multiple platforms as well as impact of
changes over time.

* ODE: improve benchmark interface...

* ODE: BDF changes to use RMS norm and change some default values

Small changes to compare more closely with reference implementation.
Some of these might be reverted eventually but that's fine for now.

* ODE: BDF convergence more stable and results look pretty good now!

Changing the Newton solver convergence criteria as well as changing
a few default input parameters leads to a more stable algorithms
which can now integrate the stiff Henderson autocatalytic example
well in 66 time steps instead of 200k for fixed order integration...

* ODE: BDF fix bug in initial time step calculation

The initial step routine was overwriting the initial right hand side
which led to obvious issues further down the road... now things should
work fine. Need to figure out if I can re-initialize the variables in
the perf test while excluding that time from each iteration.

* ODE: BDF removing bad print statement...

std::cout in device code

* ODE - BDF: improving perf test

Basically adding new untimed setup within the main loop of the
benchmark to reset the intial conditions, buffers and vectors
ahead of each iteration.

* Modifying unit-test to catch proper return type

* Applying clang-format

* cm_test_all_sandia: update caraway compilers

add rocm/5.6.1 and rocm/6.0.0, and openblas/0.3.23 as tpl

* Sparse MKL: changing the location of the MKL_SAFE_CALL macro (#2134)

* Sparse MKL: changing the location of the MKL_SAFE_CALL macro

Moving the macro outside of namespaces to ensure that it will be
interpreted correctly when called from any other location in the
library.

It does not make much sense to guard Impl code in the Experimental
namespace and in this case it cleans up a problem with namespace
disambiguation for the compiler...

* Sparse BsrSpMV: removing Experimental namespace from Impl namespace

* Applying clang-format

* Sparse SpMV: fixing more namespace issues!

* Fixing missing descriptor for bsr spmv

* Kokkos Kernels: change the default offset ETI from size_t to int (#2140)

This change makes it easier for customer to leverage TPL support
which almost always requires offset=int, ordinal=int to be enabled
meaning that no TPL support is available with our default ETI...

* KokkosSparse_spmv_bsrmatrix_spec: fix Bsr_TC_Precision namespacing

Resolve compilation errors in nightly cuda/12.2 A100 build

* Drop comment for cleaner clang-format fix

* Fix usage of RAII to set cusparse/rocsparse stream (#2141)

Temporary objects like "A()" get destructed immediately.
For the object to have scope lifetime, it needs a name like "A a();".
This was causing cusparse/rocsparse spmv to always execute on the default stream,
causing incorrect timing in the spmv perf test.

* Use execution space operator== (#2136)

It actually is part of the public interface

* cm_test_all_sandia: more caraway module updates and cleanup (#2145)

* Spmv perftest improvements (#2146)

* Spmv perf test improvements

- Add option to flush caches by filling a dummy buffer between
iterations
- Add option to call the non-reuse interface instead of handle/reuse
interface
- Fix modes T, H in nonsquare case (make x,y the correct length)

* Fix mode help text

* Update version to 4.3.0

* Revert "Kokkos Kernels: change the default offset ETI from size_t to int (#2140)"

This reverts commit 3a5498d.

* Fix signed/unsigned comparison warnings (#2150)

This is only hit when spmv is called with integer scalars,
which doesn't happen in our CI but does often in Tpetra.

* SPMV tpl fixes, cusparse workaround (#2152)

* SPMV tpl fixes, workaround

* Avoid possible integer conversion warnings

* Document cusparseSpMM algos that were tested

* Merge pull request #2147 from lucbv/KK_Utils_cleanup

KokkosKernels Utils: cleaning the zero_vector interface

(cherry picked from commit 363868e)

* KokkosBlas1_axpby.hpp: change debug macro guard for printInformation (#2157)

* KokkosBlas1_axpby.hpp: change debug macro guard for printInformation

- resolves test failures in Trilinos (MueLu) that rely on gold file diff
comparisons by removing extra output in debug builds

* fix compilation error

* Update changelog for 4.3.00 (#2148)

* Update changelog for 4.3.00

* Update CHANGELOG.md

---------

Co-authored-by: Luc Berger <[email protected]>

* FIx changelog typo

* Fix merge artifacts

* CMakeLists.txt: fix Kokkos_VERSION check

* Merge pull request #2165 from ndellingwood/test-updates

Updates from feedback runnig Trilinos testing

(cherry picked from commit cacba80)

* Update master_history.txt for 4.3.0

* KokkosLapack_svd_tpl_spec_decl: defer to MKL spec when LAPACK also enabled

Resolves redefintion of struct SVD compilation errors with both MKL and LAPACK are enabled
Reported by @maartenarnst in trilinos/Trilinos#12891

Co-authored-by: brian-kelley <[email protected]>
(cherry picked from commit 5bf5474)

---------

Co-authored-by: Luc Berger-Vergiat <[email protected]>
Co-authored-by: Ernesto Prudencio <[email protected]>
Co-authored-by: Carl Pearson <[email protected]>
Co-authored-by: Evan Harvey <[email protected]>
Co-authored-by: Carl Pearson <[email protected]>
Co-authored-by: brian-kelley <[email protected]>
Co-authored-by: Sean Miller <[email protected]>
Co-authored-by: Junchao Zhang <[email protected]>
Co-authored-by: Junchao Zhang <[email protected]>
Co-authored-by: Brian Kelley <[email protected]>
Co-authored-by: James Foucar <[email protected]>
Co-authored-by: Damien L-G <[email protected]>
Co-authored-by: Caleb Schilly <[email protected]>
Co-authored-by: Damien L-G <[email protected]>
Co-authored-by: Vinh Dang <[email protected]>
  • Loading branch information
16 people authored Apr 8, 2024
1 parent afd65f0 commit 1b0a15f
Show file tree
Hide file tree
Showing 210 changed files with 25,302 additions and 10,548 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@ jobs:
doxygen --version
- name: checkout_kokkos_kernels
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
path: kokkos-kernels

- name: checkout_kokkos
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: kokkos/kokkos
ref: develop
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
clang-format-check:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Install Dependencies
run: sudo apt install clang-format-8
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/osx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@ jobs:

steps:
- name: checkout_kokkos_kernels
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
path: kokkos-kernels

- name: checkout_kokkos
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: kokkos/kokkos
ref: ${{ github.base_ref }}
Expand Down
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,7 @@ TAGS
#Clangd indexing
compile_commands.json
.cache/
.vscode/
.vscode/

#MacOS hidden files
.DS_Store
35 changes: 35 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Read the Docs configuration file for Sphinx projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
# You can also specify other tool versions:
# nodejs: "20"
# rust: "1.70"
# golang: "1.20"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/conf.py
# You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
# builder: "dirhtml"
# Fail on all warnings to avoid broken references
# fail_on_warning: true

# Optionally build your docs in additional formats such as PDF and ePub
# formats:
# - pdf
# - epub

# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
2 changes: 1 addition & 1 deletion BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ endif()
* KokkosKernels_LAPACK_ROOT: PATH
* Location of LAPACK install root.
* Default: None or the value of the environment variable LAPACK_ROOT if set
* KokkosKernels_LINALG_OPT_LEVEL: BOOL
* KokkosKernels_LINALG_OPT_LEVEL: BOOL **DEPRECATED**
* Optimization level for KokkosKernels computational kernels: a nonnegative integer. Higher levels result in better performance that is more uniform for corner cases, but increase build time and library size. The default value is 1, which should give performance within ten percent of optimal on most platforms, for most problems.
* Default: 1
* KokkosKernels_MAGMA_ROOT: PATH
Expand Down
94 changes: 94 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,99 @@
# Change Log

## [4.3.00](https://github.com/kokkos/kokkos-kernels/tree/4.3.00) (2024-03-19)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.2.01...4.3.00)

### New Features

#### BLAS updates
- Syr2 [\#1942](https://github.com/kokkos/kokkos-kernels/pull/1942)

#### LAPACK updates
- Adding cuSOLVER [\#2038](https://github.com/kokkos/kokkos-kernels/pull/2038)
- Fix for MAGMA with CUDA [\#2044](https://github.com/kokkos/kokkos-kernels/pull/2044)
- Adding rocSOLVER [\#2034](https://github.com/kokkos/kokkos-kernels/pull/2034)
- Fix rocSOLVER issue with Trilinos dependency [\#2037](https://github.com/kokkos/kokkos-kernels/pull/2037)
- Lapack - SVD [\#2092](https://github.com/kokkos/kokkos-kernels/pull/2092)
- Adding benchmark for SVD [\#2103](https://github.com/kokkos/kokkos-kernels/pull/2103)
- Quick return to fix cuSOLVER and improve performance [\#2107](https://github.com/kokkos/kokkos-kernels/pull/2107)
- Fix Intel MKL tolerance for SVD tests [\#2110](https://github.com/kokkos/kokkos-kernels/pull/2110)

#### Sparse updates
- Add block support to all SPILUK algorithms [\#2064](https://github.com/kokkos/kokkos-kernels/pull/2064)
- Block spiluk follow up [\#2085](https://github.com/kokkos/kokkos-kernels/pull/2085)
- Make spiluk_handle::reset backwards compatible [\#2087](https://github.com/kokkos/kokkos-kernels/pull/2087)
- Sptrsv improvements
- Add sptrsv execution space overloads [\#1982](https://github.com/kokkos/kokkos-kernels/pull/1982)
- Refactor Test_Sparse_sptrsv [\#2102](https://github.com/kokkos/kokkos-kernels/pull/2102)
- Add support for BSR matrices to some trsv routines [\#2104](https://github.com/kokkos/kokkos-kernels/pull/2104)
- GMRES: Add support for BSR matrices [\#2097](https://github.com/kokkos/kokkos-kernels/pull/2097)
- Spmv handle [\#2126](https://github.com/kokkos/kokkos-kernels/pull/2126)
- Option to apply RCM reordering to extracted CRS diagonal blocks [\#2125](https://github.com/kokkos/kokkos-kernels/pull/2125)

#### ODE updates
- Adding adaptive BDF methods [\#1930](https://github.com/kokkos/kokkos-kernels/pull/1930)

#### Misc updates
- Add HIPManagedSpace support [\#2079](https://github.com/kokkos/kokkos-kernels/pull/2079)

### Enhancements:

#### BLAS
- Axpby: improvement on unification attempt logic and on the execution of a diversity of situations [\#1895](https://github.com/kokkos/kokkos-kernels/pull/1895)

#### Misc updates
- Use execution space operator== [\#2136](https://github.com/kokkos/kokkos-kernels/pull/2136)

#### TPL support
- Add TPL support for KokkosBlas::dot [\#1949](https://github.com/kokkos/kokkos-kernels/pull/1949)
- Add CUDA/HIP TPL support for KokkosSparse::spadd [\#1962](https://github.com/kokkos/kokkos-kernels/pull/1962)
- Don't call optimize_gemv for one-shot MKL spmv [\#2073](https://github.com/kokkos/kokkos-kernels/pull/2073)
- Async matrix release for MKL >= 2023.2 in SpMV [\#2074](https://github.com/kokkos/kokkos-kernels/pull/2074)
- BLAS - MKL: fixing HostBlas calls to handle MKL_INT type [\#2112](https://github.com/kokkos/kokkos-kernels/pull/2112)

### Build System:
- Support CUBLAS_{LIBRARIES,LIBRARY_DIRS,INCLUDE_DIRS,ROOT} and KokkosKernels_CUBLAS_ROOT CMake options [\#2075](https://github.com/kokkos/kokkos-kernels/pull/2075)
- Link std::filesystem for IntelLLVM in perf_test/sparse [\#2055](https://github.com/kokkos/kokkos-kernels/pull/2055)
- Fix Cuda TPL finding [\#2098](https://github.com/kokkos/kokkos-kernels/pull/2098)
- CMake: error out in certain case [\#2115](https://github.com/kokkos/kokkos-kernels/pull/2115)

### Documentation and Testing:
- par_ilut: Update documentation for fill_in_limit [\#2001](https://github.com/kokkos/kokkos-kernels/pull/2001)
- Wiki examples for BLAS2 functions are added [\#2122](https://github.com/kokkos/kokkos-kernels/pull/2122)
- github workflows: update to v4 (use Node 20) [\#2119](https://github.com/kokkos/kokkos-kernels/pull/2119)

### Benchmarks:
- gemm3 perf test: user CUDA, SYCL, or HIP device for kokkos:initialize [\#2058](https://github.com/kokkos/kokkos-kernels/pull/2058)
- Lapack: adding svd benchmark [\#2103](https://github.com/kokkos/kokkos-kernels/pull/2103)
- Benchmark: modifying spmv benchmark to fix interface and run range of spmv tests [\#2135](https://github.com/kokkos/kokkos-kernels/pull/2135)

### Cleanup:
- Experimental hip cleanup [\#1999](https://github.com/kokkos/kokkos-kernels/pull/1999)
- iostream clean-up in benchmarks [\#2004](https://github.com/kokkos/kokkos-kernels/pull/2004)
- Update: implicit capture of 'this' via '[=]' is deprecated in C++20 warnings [\#2076](https://github.com/kokkos/kokkos-kernels/pull/2076)
- Deprecate KOKKOSLINALG_OPT_LEVEL [\#2072](https://github.com/kokkos/kokkos-kernels/pull/2072)
- Remove all mentions of HBWSpace [\#2101](https://github.com/kokkos/kokkos-kernels/pull/2101)
- Change name of yaml-cpp to yamlcpp (trilinos/Trilinos#12710) [\#2099](https://github.com/kokkos/kokkos-kernels/pull/2099)
- Hands off namespace Kokkos::Impl - cleanup couple violations that snuck in [\#2094](https://github.com/kokkos/kokkos-kernels/pull/2094)
- Kokkos Kernels: update version guards to drop old version of Kokkos [\#2133](https://github.com/kokkos/kokkos-kernels/pull/2133)
- Sparse MKL: changing the location of the MKL_SAFE_CALL macro [\#2134](https://github.com/kokkos/kokkos-kernels/pull/2134)

### Bug Fixes:
- Bspgemm cusparse hang [\#2008](https://github.com/kokkos/kokkos-kernels/pull/2008)
- bhalf_t fix for isnan function [\#2007](https://github.com/kokkos/kokkos-kernels/pull/2007)
- Fence Kokkos before timed iterations [\#2066](https://github.com/kokkos/kokkos-kernels/pull/2066)
- CUDA 11.2.1 / cuSPARSE 11.4.0 changed SpMV enums [\#2011](https://github.com/kokkos/kokkos-kernels/pull/2011)
- Fix the spadd API [\#2090](https://github.com/kokkos/kokkos-kernels/pull/2090)
- Axpby reduce deep copy calls [\#2081](https://github.com/kokkos/kokkos-kernels/pull/2081)
- Correcting BLAS test failures with cuda when ETI_ONLY = OFF (issue #2061) [\#2077](https://github.com/kokkos/kokkos-kernels/pull/2077)
- Fix weird Trilinos compiler error [\#2117](https://github.com/kokkos/kokkos-kernels/pull/2117)
- Fix for missing STL inclusion [\#2113](https://github.com/kokkos/kokkos-kernels/pull/2113)
- Fix build error in trsv on gcc8 [\#2111](https://github.com/kokkos/kokkos-kernels/pull/2111)
- Add a workaround for compilation errors with cuda-12.2.0 + gcc-12.3 [\#2108](https://github.com/kokkos/kokkos-kernels/pull/2108)
- Increase tolerance on gesv test (Fix #2123) [\#2124](https://github.com/kokkos/kokkos-kernels/pull/2124)
- Fix usage of RAII to set cusparse/rocsparse stream [\#2141](https://github.com/kokkos/kokkos-kernels/pull/2141)
- Spmv bsr matrix fix missing matrix descriptor (rocsparse) [\#2138](https://github.com/kokkos/kokkos-kernels/pull/2138)

## [4.2.01](https://github.com/kokkos/kokkos-kernels/tree/4.2.01) (2024-01-17)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.2.00...4.2.01)

Expand Down
23 changes: 16 additions & 7 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ SET(KOKKOSKERNELS_TOP_BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR})
SET(KOKKOSKERNELS_TOP_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})

SET(KokkosKernels_VERSION_MAJOR 4)
SET(KokkosKernels_VERSION_MINOR 2)
SET(KokkosKernels_VERSION_PATCH 1)
SET(KokkosKernels_VERSION_MINOR 3)
SET(KokkosKernels_VERSION_PATCH 0)
SET(KokkosKernels_VERSION "${KokkosKernels_VERSION_MAJOR}.${KokkosKernels_VERSION_MINOR}.${KokkosKernels_VERSION_PATCH}")

#Set variables for config file
Expand Down Expand Up @@ -127,13 +127,13 @@ ELSE()
IF (NOT KOKKOSKERNELS_HAS_TRILINOS AND NOT KOKKOSKERNELS_HAS_PARENT)
# This is a standalone build
FIND_PACKAGE(Kokkos REQUIRED)
IF((${Kokkos_VERSION} VERSION_EQUAL "4.1.00") OR (${Kokkos_VERSION} VERSION_GREATER_EQUAL "4.2.00"))
IF((${Kokkos_VERSION} VERSION_GREATER_EQUAL "4.1.0") AND (${Kokkos_VERSION} VERSION_LESS_EQUAL "4.3.0"))
MESSAGE(STATUS "Found Kokkos version ${Kokkos_VERSION} at ${Kokkos_DIR}")
IF((${Kokkos_VERSION} VERSION_GREATER "4.2.99"))
IF((${Kokkos_VERSION} VERSION_GREATER "4.3.99"))
MESSAGE(WARNING "Configuring with Kokkos ${Kokkos_VERSION} which is newer than the expected develop branch - version check may need update")
ENDIF()
ELSE()
MESSAGE(FATAL_ERROR "Kokkos Kernels ${KokkosKernels_VERSION} requires 4.1.00, 4.2.00, 4.2.01 or develop")
MESSAGE(FATAL_ERROR "Kokkos Kernels ${KokkosKernels_VERSION} requires Kokkos_VERSION 4.1.0, 4.2.0, 4.2.1 or 4.3.0")
ENDIF()
ENDIF()

Expand All @@ -156,9 +156,16 @@ ELSE()
KOKKOSKERNELS_ADD_OPTION_AND_DEFINE(
LINALG_OPT_LEVEL
KOKKOSLINALG_OPT_LEVEL
"Optimization level for KokkosKernels computational kernels: a nonnegative integer. Higher levels result in better performance that is more uniform for corner cases, but increase build time and library size. The default value is 1, which should give performance within ten percent of optimal on most platforms, for most problems. Default: 1"
"DEPRECATED. Optimization level for KokkosKernels computational kernels: a nonnegative integer. Higher levels result in better performance that is more uniform for corner cases, but increase build time and library size. The default value is 1, which should give performance within ten percent of optimal on most platforms, for most problems. Default: 1"
"1")

if (KokkosKernels_LINALG_OPT_LEVEL AND NOT KokkosKernels_LINALG_OPT_LEVEL STREQUAL "1")
message(WARNING "KokkosKernels_LINALG_OPT_LEVEL is deprecated!")
endif()
if(KokkosKernels_KOKKOSLINALG_OPT_LEVEL AND NOT KokkosKernels_KOKKOSLINALG_OPT_LEVEL STREQUAL "1")
message(WARNING "KokkosKernels_KOKKOSLINALG_OPT_LEVEL is deprecated!")
endif()

# Enable experimental features of KokkosKernels if set at configure
# time. Default is no.
KOKKOSKERNELS_ADD_OPTION_AND_DEFINE(
Expand Down Expand Up @@ -375,8 +382,10 @@ ELSE()
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC MKL)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC CUBLAS)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC CUSPARSE)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC CUSOLVER)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC ROCBLAS)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC ROCSPARSE)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC ROCSOLVER)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC METIS)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC ARMPL)
KOKKOSKERNELS_LINK_TPL(kokkoskernels PUBLIC MAGMA)
Expand Down Expand Up @@ -425,7 +434,7 @@ ELSE()
IF (KOKKOSKERNELS_ALL_COMPONENTS_ENABLED)
IF (KokkosKernels_ENABLE_PERFTESTS)
MESSAGE(STATUS "Enabling perf tests.")
KOKKOSKERNELS_ADD_TEST_DIRECTORIES(perf_test)
add_subdirectory(perf_test) # doesn't require KokkosKernels_ENABLE_TESTS=ON
ENDIF ()
IF (KokkosKernels_ENABLE_EXAMPLES)
MESSAGE(STATUS "Enabling examples.")
Expand Down
8 changes: 4 additions & 4 deletions CheckHostBlasReturnComplex.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ FUNCTION(CHECK_HOST_BLAS_RETURN_COMPLEX VARNAME)
extern \"C\" {
void F77_BLAS_MANGLE(zdotc,ZDOTC)(
std::complex<double>* result, const int* n,
const std::complex<double> x[], const int* incx,
std::complex<double>* result, const int* n,
const std::complex<double> x[], const int* incx,
const std::complex<double> y[], const int* incy);
}
Expand All @@ -49,9 +49,9 @@ int main() {
CHECK_CXX_SOURCE_RUNS("${SOURCE}" KK_BLAS_RESULT_AS_POINTER_ARG)

IF(${KK_BLAS_RESULT_AS_POINTER_ARG})
SET(VARNAME OFF)
SET(${VARNAME} OFF PARENT_SCOPE)
ELSE()
SET(VARNAME ON)
SET(${VARNAME} ON PARENT_SCOPE)
ENDIF()

ENDFUNCTION()
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[![Generic badge](https://readthedocs.org/projects/pip/badge/?version=latest&style=flat)](https://kokkos-kernels.readthedocs.io/en/latest/)
[![Generic badge](https://readthedocs.org/projects/kokkos-kernels/badge/?version=latest)](https://kokkos-kernels.readthedocs.io/en/latest/)

![KokkosKernels](https://avatars2.githubusercontent.com/u/10199860?s=200&v=4)

Expand Down
24 changes: 0 additions & 24 deletions batched/KokkosBatched_Util.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -626,18 +626,6 @@ KOKKOS_INLINE_FUNCTION auto subview_wrapper(ViewType v, IdxType1 i1,
const Trans::NoTranspose) {
return subview_wrapper(v, i1, i2, i3, layout_tag);
}
#if KOKKOS_VERSION < 40099
template <class ViewType, class IdxType1>
KOKKOS_INLINE_FUNCTION auto subview_wrapper(ViewType v, IdxType1 i1,
Kokkos::Impl::ALL_t i2,
Kokkos::Impl::ALL_t i3,
const BatchLayout::Left &layout_tag,
const Trans::Transpose) {
auto sv_nt = subview_wrapper(v, i1, i3, i2, layout_tag);

return transpose_2d_view(sv_nt, layout_tag);
}
#else
template <class ViewType, class IdxType1>
KOKKOS_INLINE_FUNCTION auto subview_wrapper(ViewType v, IdxType1 i1,
Kokkos::ALL_t i2, Kokkos::ALL_t i3,
Expand All @@ -647,7 +635,6 @@ KOKKOS_INLINE_FUNCTION auto subview_wrapper(ViewType v, IdxType1 i1,

return transpose_2d_view(sv_nt, layout_tag);
}
#endif
template <class ViewType, class IdxType1, class IdxType2, class IdxType3>
KOKKOS_INLINE_FUNCTION auto subview_wrapper(ViewType v, IdxType1 i1,
IdxType2 i2, IdxType3 i3,
Expand All @@ -671,16 +658,6 @@ KOKKOS_INLINE_FUNCTION auto subview_wrapper(
const BatchLayout::Right &layout_tag, const Trans::NoTranspose &) {
return subview_wrapper(v, i1, i2, i3, layout_tag);
}
#if KOKKOS_VERSION < 40099
template <class ViewType, class IdxType1>
KOKKOS_INLINE_FUNCTION auto subview_wrapper(
ViewType v, IdxType1 i1, Kokkos::Impl::ALL_t i2, Kokkos::Impl::ALL_t i3,
const BatchLayout::Right &layout_tag, const Trans::Transpose &) {
auto sv_nt = subview_wrapper(v, i1, i3, i2, layout_tag);

return transpose_2d_view(sv_nt, layout_tag);
}
#else
template <class ViewType, class IdxType1>
KOKKOS_INLINE_FUNCTION auto subview_wrapper(
ViewType v, IdxType1 i1, Kokkos::ALL_t i2, Kokkos::ALL_t i3,
Expand All @@ -689,7 +666,6 @@ KOKKOS_INLINE_FUNCTION auto subview_wrapper(

return transpose_2d_view(sv_nt, layout_tag);
}
#endif
template <class ViewType, class IdxType1, class IdxType2, class IdxType3>
KOKKOS_INLINE_FUNCTION auto subview_wrapper(
ViewType v, IdxType1 i1, IdxType2 i2, IdxType3 i3,
Expand Down
Loading

0 comments on commit 1b0a15f

Please sign in to comment.