-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NDTensorsCUDAExt] Remove and test for scalar indexing #1245
Conversation
…lled which scales very poorly on CUDA.
The issue there turned out to be that one of the tensors was actually on CPU (since the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1245 +/- ##
===========================================
- Coverage 85.44% 54.81% -30.63%
===========================================
Files 89 88 -1
Lines 8401 8348 -53
===========================================
- Hits 7178 4576 -2602
- Misses 1223 3772 +2549 ☔ View full report in Codecov by Sentry. |
@mtfishman , yes so the direct problem with the issue was related to the delta tensor but there was also a thrown warning about scalar indexing. When I ran the test with
I also tried with just |
I see, can you add tests with those cases? Ideally it would directly use |
Yes I can add the tests, and it does show the same error with
And the code throws an error. |
I guess Metal has the same issue: julia> using Metal
julia> A, B, C = mtl.((randn(2, 2), randn(2, 2), randn(2, 2)))
(Float32[1.9648244 -0.14690858; 0.841998 0.8000668], Float32[0.29232308 -1.0040092; -0.30180508 -0.26494592], Float32[1.9440461 -0.0030814873; -0.13209581 0.10974641])
julia> using LinearAlgebra
julia> mul!(transpose(C), A, B)
┌ Warning: Performing scalar indexing on task Task (runnable) @0x000000010ae4c010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:106
2×2 transpose(::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}) with eltype Float32:
0.618701 -1.93378
0.00467122 -1.05735 |
Also Hermitian transposition: julia> using Metal
julia> A, B, C = mtl.((randn(2, 2), randn(2, 2), randn(2, 2)))
(Float32[0.418296 -0.7982497; 2.1851687 0.10532443], Float32[0.028218202 0.96657676; -1.4152029 -0.11327304], Float32[1.0778706 0.3015457; -0.34520924 -0.62605405])
julia> using LinearAlgebra
julia> mul!(C', A, B)
┌ Warning: Performing scalar indexing on task Task (runnable) @0x000000010ad68010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:106
2×2 adjoint(::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}) with eltype Float32:
1.14149 0.494735
-0.0873939 2.1002 |
So please add a similar definition for |
Co-authored-by: Matt Fishman <[email protected]>
Co-authored-by: Matt Fishman <[email protected]>
Co-authored-by: Matt Fishman <[email protected]>
…rs.jl into kmp5/debug/scalar_indexing
Thanks @kmp5VT, this will be very useful for ensuring performance of GPU backends going forward. Is this ready to merge from your end once tests pass? |
@mtfishman not a problem. Yes this is ready to merge, I have no additional changes this round. Thanks! |
Description
@mtfishman I thought it was weird that there was a scalar indexing warning in this bug report https://itensor.discourse.group/t/cuda-issue-when-converting-mps-to-mpo/1274/2, so I looked into it. I believe Julia determines which
mul!
to use based on theC
matrix. WhenC
is aTranspose
the generic matmulmul!
implementation is called which causes a scalar indexing problem. Its completely fine ifA
orB
are aTranspose
, the CUDAmul!
kernel will still be called. So I created this exposedmul!
function to fix the problem