You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenCL transpose breaks with matrices of rank 1296 or greater with the NVIDIA OpenCL implementation. This is NVIDIA-specific, because the Intel OpenCL is fine for much larger matrices.
It is possible that there is something that I can query to know in advance that this problem will appear. CL_DEVICE_ADDRESS_BITS exists but if the problem is 32b indexing, that should not manifest at 1296 (which is only 12.8 MiB).
jrhammon@klondike:~/Work/PRK/github-official/Cxx11$ ./transpose-opencl 10 1295
Parallel Research Kernels version 2.16
C++11/OpenCL Matrix transpose: B = A^T
Available OpenCL platform: NVIDIA CUDA
Available OpenCL platform: Intel(R) OpenCL
Matrix order = 1295
Number of iterations = 10
Solution validates
Rate (MB/s): 12611.9 Avg time (s): 0.00106378
jrhammon@klondike:~/Work/PRK/github-official/Cxx11$ ./transpose-opencl 10 1296
Parallel Research Kernels version 2.16
C++11/OpenCL Matrix transpose: B = A^T
Available OpenCL platform: NVIDIA CUDA
Available OpenCL platform: Intel(R) OpenCL
Matrix order = 1296
Number of iterations = 10
ERROR: Aggregate squared error 1896 exceeds threshold 1e-08
The text was updated successfully, but these errors were encountered:
OpenCL transpose breaks with matrices of rank 1296 or greater with the NVIDIA OpenCL implementation. This is NVIDIA-specific, because the Intel OpenCL is fine for much larger matrices.
It is possible that there is something that I can query to know in advance that this problem will appear.
CL_DEVICE_ADDRESS_BITS
exists but if the problem is 32b indexing, that should not manifest at 1296 (which is only 12.8 MiB).The text was updated successfully, but these errors were encountered: