You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vector divergence is by far our most expensive kernel for forward stepping conservation laws. In a separate repository, I've put together various implementations of the vector divergence kernels for 2-D and 3-D and demonstrated that a hand-written kernel that leverages shared memory is able to outperform our current hipblas implementation substantially for both 2-D and 3-D.
In 2-D, the kernels are valid for polynomial degrees 2-15. At polynomial degree 15, however, the hipblas implementations are more performant ( for N < 15, the hand-written kernels are faster by an order of magnitude). See the bandwidth graphics here. We'll want to bring these kernels in and add in logic to switch between the hipblas implementation and our hand-written kernels. It'd be ideal to put in a test for convergence that varies the polynomial degree between 2 and 15 to exercise each implementation and confirms spectral accuracy.
In 3-D, the kernels are valid for polynomial degrees 2-7. Beyond polynomial degree 7, we'll want to fall back on the hipblas implementation, since these will produce valid results, even though the performance is not ideal. As with 2-D, we'll want to put in a convergence test that varies the polynomial degree between 2 and 15 to exercise each implementation and confirms spectral accuracy.
The text was updated successfully, but these errors were encountered:
Vector divergence is by far our most expensive kernel for forward stepping conservation laws. In a separate repository, I've put together various implementations of the vector divergence kernels for 2-D and 3-D and demonstrated that a hand-written kernel that leverages shared memory is able to outperform our current hipblas implementation substantially for both 2-D and 3-D.
In 2-D, the kernels are valid for polynomial degrees 2-15. At polynomial degree 15, however, the hipblas implementations are more performant ( for N < 15, the hand-written kernels are faster by an order of magnitude). See the bandwidth graphics here. We'll want to bring these kernels in and add in logic to switch between the hipblas implementation and our hand-written kernels. It'd be ideal to put in a test for convergence that varies the polynomial degree between 2 and 15 to exercise each implementation and confirms spectral accuracy.
In 3-D, the kernels are valid for polynomial degrees 2-7. Beyond polynomial degree 7, we'll want to fall back on the hipblas implementation, since these will produce valid results, even though the performance is not ideal. As with 2-D, we'll want to put in a convergence test that varies the polynomial degree between 2 and 15 to exercise each implementation and confirms spectral accuracy.
The text was updated successfully, but these errors were encountered: