forked from kokkos/kokkos-kernels
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
64 lines (49 loc) · 2.76 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
KokkosKernels implements local computational kernels for linear
algebra and graph operations, using the Kokkos shared-memory parallel
programming model. "Local" means not using MPI, or running within a
single MPI process without knowing about MPI. "Computational kernels"
are coarse-grained operations; they take a lot of work and make sense
to parallelize inside using Kokkos. KokkosKernels can be the building
block of a parallel linear algebra library like Tpetra that uses MPI
and threads for parallelism, or it can be used stand-alone in your
application.
Computational kernels in this subpackage include the following:
- (Multi)vector dot products, norms, and updates (AXPY-like
operations that add vectors together entry-wise)
- Sparse matrix-vector multiply and other sparse matrix / dense
vector kernels
- Sparse matrix-matrix multiply
- Graph coloring
- Gauss-Seidel with coloring (generalization of red-black)
- Other linear algebra and graph operations
Kokkos is licensed under standard 3-clause BSD terms of use. For
specifics, please refer to the LICENSE file contained in the
repository or distribution.
We organize this directory as follows:
1. Public interfaces to computational kernels live in the src/
subdirectory (kokkos-kernels/src):
- Kokkos_Blas1_MV.hpp: (Multi)vector operations that
Tpetra::MultiVector uses
- Kokkos_Sparse_CrsMatrix.hpp: Declaration and definition of
KokkosSparse::CrsMatrix, the sparse matrix data structure used
for the computational kernels below
- KokkosSparse_spmv.hpp: Sparse matrix-vector multiply with a
single vector, stored in a 1-D View + Sparse matrix-vector multiply with
multiple vectors at a time (multivectors), stored in a 2-D View
2. Implementations of computational kernels live in the src/impl/
subdirectory (kokkos-kernels/src/impl)
3. Correctness tests live in the unit_test/ subdirectory, and
performance tests live in the perf_test/ subdirectory
4. Compact batched BLAS: Refer the README in src/stage/batched/.
Do NOT use or rely on anything in the KokkosBlas::Impl namespace, or
on anything in the impl/ subdirectory.
This separation of interface and implementation lets the interface
assign the users' Views to View types with the desired attributes
(e.g., read-only, RandomRead). This also makes it easier to provide
full specializations of the implementation. "Full specializations"
mean that all the template parameters are fixed, so that the compiler
can actually compile the code. This technique keeps your library's or
application's build times down, since kernels are already precompiled
for certain template parameter combinations. It also improves
performance, since compilers have an easier time optimizing code in
shorter .cpp files.