- Mark Hoemmen ([email protected]) (Sandia National Laboratories)
- D. S. Hollman ([email protected]) (Sandia National Laboratories)
- Christian Trott ([email protected]) (Sandia National Laboratories)
- Daniel Sunderland ([email protected]) (Sandia National Laboratories)
- Nevin Liber ([email protected]) (Argonne National Laboratory)
- Siva Rajamanickam ([email protected]) (Sandia National Laboratories)
- Li-Ta Lo ([email protected]) (Los Alamos National Laboratory)
- Damien Lebrun-Grandie ([email protected]) (Oak Ridge National Laboratory)
- Graham Lopez ([email protected]) (Oak Ridge National Laboratory)
- Peter Caday ([email protected]) (Intel)
- Sarah Knepper ([email protected]) (Intel)
- Piotr Luszczek ([email protected]) (University of Tennessee)
- Timothy Costa ([email protected]) (NVIDIA)
- Chip Freitag ([email protected]) (AMD)
- Bryce Lelbach ([email protected]) (NVIDIA)
- Srinath Vadlamani ([email protected]) (ARM)
- Rene Vanoostrum ([email protected]) (AMD)
-
Existing standard (not ISO) developed in the 90s
-
Widely used in Science and Engineering code
-
Many hardware vendors provide optimized implementations as part of system libraries
- Intel: Math Kernel Library (MKL)
- NVIDIA: CUBLAS
- IBM: Engineering and Scientific Subroutine Library (ESSL)
- ARM: Arm Performance Libraries
- HPE: Cray LibSCI
-
BUT: it is in Fortran, maybe with a C interface
// Matrix
int nRows = ...;
int nCols = ...;
double* A = new double[nRows*nCols];
// Vectors
double* y = new double[nRows];
double* x = new double[nCols];
// y = 1.0*A*x;
dgemv('N',nRows,nCols,1.0,A,nRows,x,1,0.0,y,1);
Lets upack the 11 !! parameters to compute y = A*x
:
N
: the matrix is not transposednRows
: Number of Rows (also length ofy
)nCols
: Number of Columns (also length ofx
)1.0
: scaling factor forA
A
: pointer to the matrix valuesnRows
: stride of the rows ofA
(in our case the stride is number of rows)x
: right hand side vector1
: stride ofx
0.0
: scaling ofy
y
: left hand side vector1
: stride ofy
- 11 !! parameters to do
y = A*x
- hopefully you defined the extern C functor
dgemv
correct to get the fortran link right- the actual fortran functions need to take all the scalar parameters as pointers ...
- The type of the scalars is part of the function name (
dgemv
==double
,sgemv
==float
)- no mixed support, e.g.
A
andx
arefloat
andy
isdouble
- no mixed support, e.g.
- No compile time size propagation
- Machine Learning is all about small operations with compile time known sizes
- Storage Layout of matrix
A
better be Fortran layout - Scaling parameters change behavior:
0
means to "ignore" the argument (important for NAN propagation for example)
- Reviewed by SGs 6/19 in Cologne (Rev 0) and Belfast (Rev 1)
- Reviewed by LEWGi in Cologne/Belfast
- Linear Algebra Functions are just another set of algorithms
- Propose a free function interface similar to standard algorithms
- Encapsule data in the minimal fundamental representation of multi dimensional arrays still providing necessary felxibility
- Use
mdspan
as parameters
- Use
- Ensure that all functionality of the BLAS standard is covered
- Propose equivalent of every function in BLAS
- Ensure that new areas of linear algebra such as Machine Learning are covered
- Turns out mdspan gives that to us naturally
// Matrix
mdspan<const double,dynamic_extent,dynamic_extent> A(A_ptr,nRows,nCols);
mdspan<const double,dynamic_extent> x(x_ptr,nCols);
mdspan<double,dynamic_extent> y(y_ptr,nRows);
matrix_vector_product(y,A,x);
- Only the 3 parameters you would expect
- Function name doesn't encode scalar type
- Just use the desired scalar types for each argument
// Matrix
mdspan<const float,dynamic_extent,dynamic_extent> A(A_ptr,nRows,nCols);
mdspan<const float,dynamic_extent> x(x_ptr,nCols);
mdspan<double,dynamic_extent>y(y_ptr,nRows);
// Can infer from return argument y to sum up in double precision
matrix_vector_product(y,A,x);
- MDSpan supports compile time extents
// Matrix
mdspan<const float,8,4> A(A_ptr,nRows,nCols);
mdspan<const float,4> x(x_ptr,nCols);
mdspan<double,8>y(y_ptr,nRows);
// All the dimensions are known at compile time
// Enables full unrolling, vectorization etc.
matrix_vector_product(y,A,x);
basic_mdspan
is flexible enough to represent a scaling factor- use a 'scaling accessor' which scales values inline upon access
// Matrix
mdspan<const float,8,4> A(A_ptr,nRows,nCols);
mdspan<const float,4> x(x_ptr,nCols);
mdspan<double,8>y(y_ptr,nRows);
// scaled_view returns a new mdspan with an accessor which
// multiples elements by 2.0 before returning a value
matrix_vector_product(y,scaled_view(2.0,A),x);
- Note: no need to write a different implementation for
matrix_vector_product
- Same as with a scaling factor:
basic_mdspan
can accomodate that simply through a change oflayout
// Matrix
mdspan<const float,8,4> A(A_ptr,nRows,nCols);
mdspan<const float,4> x(x_ptr,nCols);
mdspan<double,8>y(y_ptr,nRows);
// Transpose_view return a new basic_mdspan with a transposed layout
matrix_vector_product(y,transpose_view(A),x);
- Covered by
basic_mdspan
and itslayouts
e.g.layout_stride
- we also propose more specific linear algebra layouts.
- This covers use cases such as building blocks for tensor algebra, where both dimensions of a matrix need to have a stride
- The BLAS standard doesn't support that
- Glancing over variants here and some minor helper things
layout_blas_packed
layout_blas_general
scaled_view
tranpose_view
conjugate_view
conjugate_transpose_view
- Matrix and Vector algorithms
linalg_swap
scale
linalg_copy
linalg_add
- Vector operations
givens_rotation_apply
dot
vector_norm2
vector_abs_sum
idx_abs_max
- Matrix-Vector algorithms
matrix_vector_product
symmetric_matrix_vector_product
hermitian_matrix_vector_product
triangular_matrix_vector_product
triangular_matrix_vector_solve
matrix_rank_1_update
symmetric_matrix_rank_1_update
hermitian_matrix_rank_1_update
symmetric_matrix_rank_2_update
hermitian_matrix_rank_2_update
- Matrix-Matrix algorithms
matrix_product
symmetric_matrix_right_product
hermitian_matrix_right_product
triangular_matrix_right_product
symmetric_matrix_left_product
hermitian_matrix_left_product
triangular_matrix_left_product
symmetric_matrix_rank_k_update
hermitian_matrix_rank_k_update
symmetric_matrix_rank_2k_update
hermitian_matrix_rank_2k_update
triangular_matrix_matrix_left_solve
triangular_matrix_matrix_right_solve
- Under preparation at https://github.com/kokkos/stdBlas
- didn't have much time last year
- Project plan has work scheduled for it for the next three months
- Questions? Things LEWG wants to look at in detail?
- Walk through Algorithm Requirements
- Look at matrix-vector multiply and dot
- https://github.com/ORNL/cpp-proposals-pub/blob/master/D1673/blas_interface.md#dot-product-of-two-vectors-linalgalgsblas1dot
- https://github.com/ORNL/cpp-proposals-pub/blob/master/D1673/blas_interface.md#general-matrix-vector-product-linalgalgsblas2gemv
- https://github.com/ORNL/cpp-proposals-pub/blob/master/D1673/blas_interface.md#symmetric-matrix-vector-product-linalgalgsblas2symv