Swap in more performant kernels for vector divergence #66

fluidnumerics-joe · 2024-10-31T18:39:06Z

Vector divergence is by far our most expensive kernel for forward stepping conservation laws. In a separate repository, I've put together various implementations of the vector divergence kernels for 2-D and 3-D and demonstrated that a hand-written kernel that leverages shared memory is able to outperform our current hipblas implementation substantially for both 2-D and 3-D.

In 2-D, the kernels are valid for polynomial degrees 2-15. At polynomial degree 15, however, the hipblas implementations are more performant ( for N < 15, the hand-written kernels are faster by an order of magnitude). See the bandwidth graphics here. We'll want to bring these kernels in and add in logic to switch between the hipblas implementation and our hand-written kernels. It'd be ideal to put in a test for convergence that varies the polynomial degree between 2 and 15 to exercise each implementation and confirms spectral accuracy.

In 3-D, the kernels are valid for polynomial degrees 2-7. Beyond polynomial degree 7, we'll want to fall back on the hipblas implementation, since these will produce valid results, even though the performance is not ideal. As with 2-D, we'll want to put in a convergence test that varies the polynomial degree between 2 and 15 to exercise each implementation and confirms spectral accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swap in more performant kernels for vector divergence #66

Swap in more performant kernels for vector divergence #66

fluidnumerics-joe commented Oct 31, 2024

Swap in more performant kernels for vector divergence #66

Swap in more performant kernels for vector divergence #66

Comments

fluidnumerics-joe commented Oct 31, 2024