[NDTensorsCUDAExt] Remove and test for scalar indexing #1245

kmp5VT · 2023-11-09T21:15:30Z

Description

@mtfishman I thought it was weird that there was a scalar indexing warning in this bug report https://itensor.discourse.group/t/cuda-issue-when-converting-mps-to-mpo/1274/2, so I looked into it. I believe Julia determines which mul! to use based on the C matrix. When C is a Transpose the generic matmul mul! implementation is called which causes a scalar indexing problem. Its completely fine if A or B are a Transpose, the CUDA mul! kernel will still be called. So I created this exposed mul! function to fix the problem

…lled which scales very poorly on CUDA.

mtfishman · 2023-11-09T21:23:14Z

The issue there turned out to be that one of the tensors was actually on CPU (since the delta tensor got allocated there by default), if you look at the stack trace there is a Matrix instead of a CuMatrix. So are you certain this PR actually fixes a known issue?

codecov-commenter · 2023-11-09T21:41:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d9265a2) 85.44% compared to head (ae147c7) 54.81%.
Report is 1 commits behind head on main.

❗ Current head ae147c7 differs from pull request most recent head 21aa398. Consider uploading reports for the commit 21aa398 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1245       +/-   ##
===========================================
- Coverage   85.44%   54.81%   -30.63%     
===========================================
  Files          89       88        -1     
  Lines        8401     8348       -53     
===========================================
- Hits         7178     4576     -2602     
- Misses       1223     3772     +2549

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kmp5VT · 2023-11-10T00:17:15Z

The issue there turned out to be that one of the tensors was actually on CPU (since the delta tensor got allocated there by default), if you look at the stack trace there is a Matrix instead of a CuMatrix. So are you certain this PR actually fixes a known issue?

@mtfishman , yes so the direct problem with the issue was related to the delta tensor but there was also a thrown warning about scalar indexing. When I ran the test with CUDA.allowscalar(false) it points me to scalar indexing in mul! with A and C being Transposed so I made this test

using ITensors, NDTensors
i,j,k = Index.((2,2,2))
A = NDTensors.cu(randomTensor(j,i)); B = NDTensors.cu(randomTensor(j,k)); C = NDTensors.cu(randomTensor(k,i));
contract(C,(3,1), A, (-2,1), B, (-2,3))

I also tried with just Transpose(C) = A * B also Transpose(C) = A * Transpose(B) and all 3 scenarios fail (use scalar indexing) and it looks like its because of the Transpose(C) wrapper. But this change fixes all three

mtfishman · 2023-11-10T13:29:37Z

I see, can you add tests with those cases? Ideally it would directly use CuMatrix and mul! and not test it through contract!.

NDTensors/ext/NDTensorsCUDAExt/mul.jl

kmp5VT · 2023-11-10T14:24:12Z

I see, can you add tests with those cases? Ideally it would directly use CuMatrix and mul! and not test it through contract!.

Yes I can add the tests, and it does show the same error with

A = fill!(CuMatrix{Float64, CUDA.Mem.DeviceBuffer}(undef, (2,3)), randn(Float64))
B = fill!(CuMatrix{Float64, CUDA.Mem.DeviceBuffer}(undef, (3,4)), randn(Float64))
C = fill!(CuMatrix{Float64, CUDA.Mem.DeviceBuffer}(undef, (4,2)), randn(Float64))
CUDA.allowscalar(false)
mul!(transpose(C), A, B, 1.0, 0.0)

And the code throws an error.

mtfishman · 2023-11-10T14:27:13Z

I guess Metal has the same issue:

julia> using Metal

julia> A, B, C = mtl.((randn(2, 2), randn(2, 2), randn(2, 2)))
(Float32[1.9648244 -0.14690858; 0.841998 0.8000668], Float32[0.29232308 -1.0040092; -0.30180508 -0.26494592], Float32[1.9440461 -0.0030814873; -0.13209581 0.10974641])

julia> using LinearAlgebra

julia> mul!(transpose(C), A, B)
┌ Warning: Performing scalar indexing on task Task (runnable) @0x000000010ae4c010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:106
2×2 transpose(::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}) with eltype Float32:
 0.618701    -1.93378
 0.00467122  -1.05735

mtfishman · 2023-11-10T14:29:49Z

Also Hermitian transposition:

julia> using Metal

julia> A, B, C = mtl.((randn(2, 2), randn(2, 2), randn(2, 2)))
(Float32[0.418296 -0.7982497; 2.1851687 0.10532443], Float32[0.028218202 0.96657676; -1.4152029 -0.11327304], Float32[1.0778706 0.3015457; -0.34520924 -0.62605405])

julia> using LinearAlgebra

julia> mul!(C', A, B)
┌ Warning: Performing scalar indexing on task Task (runnable) @0x000000010ad68010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:106
2×2 adjoint(::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}) with eltype Float32:
  1.14149    0.494735
 -0.0873939  2.1002

mtfishman · 2023-11-10T14:31:27Z

So please add a similar definition for Adjoint.

NDTensors/ext/NDTensorsCUDAExt/mul.jl

NDTensors/ext/NDTensorsCUDAExt/imports.jl

NDTensors/ext/NDTensorsCUDAExt/mul.jl

NDTensors/ext/NDTensorsMetalExt/mul.jl

NDTensors/src/Unwrap/test/runtests.jl

NDTensors/test/combiner.jl

NDTensors/src/Unwrap/test/runtests.jl

NDTensors/test/dense.jl

test/base/test_itensor.jl

NDTensors/Project.toml

NDTensors/ext/NDTensorsCUDAExt/NDTensorsCUDAExt.jl

NDTensors/ext/NDTensorsCUDAExt/copyto.jl

NDTensors/ext/NDTensorsCUDAExt/mul.jl

Co-authored-by: Matt Fishman <[email protected]>

…rs.jl into kmp5/debug/scalar_indexing

mtfishman · 2023-11-15T17:41:22Z

Thanks @kmp5VT, this will be very useful for ensuring performance of GPU backends going forward. Is this ready to merge from your end once tests pass?

kmp5VT · 2023-11-15T17:50:32Z

Thanks @kmp5VT, this will be very useful for ensuring performance of GPU backends going forward. Is this ready to merge from your end once tests pass?

@mtfishman not a problem. Yes this is ready to merge, I have no additional changes this round. Thanks!

kmp5VT added 2 commits November 9, 2023 16:02

When calling mul! with a Transpose(CuArray) generic CPU mul is ca…

7576c3b

…lled which scales very poorly on CUDA.

format

efcde0c

kmp5VT requested a review from mtfishman November 9, 2023 21:15

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/mul.jl Outdated Show resolved Hide resolved

kmp5VT added 4 commits November 10, 2023 12:30

Replace undef with rand

ebb1713

Implement Adjoint and Transpose expose mul! functions for cuda and Metal

077527b

Add transpose and adjoint tests to Unwrap

65ea45a

Merge branch 'main' into kmp5/debug/scalar_indexing

0b6b07e

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/mul.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/imports.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/mul.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/mul.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/mul.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/mul.jl Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/ext/NDTensorsMetalExt/mul.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 10, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

kmp5VT marked this pull request as ready for review November 15, 2023 00:15

kmp5VT changed the title ~~[NDTensorsCUDAExt] More Scalar indexing fixes~~ [NDTensorsCUDAExt] Address and remove all scalar indexing from NDTensors/test Nov 15, 2023

mtfishman changed the title ~~[NDTensorsCUDAExt] Address and remove all scalar indexing from NDTensors/test~~ [NDTensorsCUDAExt] Remove and test for scalar indexing Nov 15, 2023

Add tests for complicated wrapper (breaks in CUDA.jl)

da6aca0

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/test/combiner.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

Consistently annotate loops as @allowscalar

78d4d06

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/src/Unwrap/test/runtests.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/test/dense.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 15, 2023

View reviewed changes

test/base/test_itensor.jl Outdated Show resolved Hide resolved

kmp5VT added 6 commits November 15, 2023 10:22

test mul! instead of mul!!

b40f227

format

ae3fb94

remove outdated todo comment

180da60

change .+ to +, comment out code and add todo

2e05392

Don't comment out testset

7e99a57

add mul!! test as a second verification

4d330e5

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/Project.toml Outdated Show resolved Hide resolved

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/NDTensorsCUDAExt.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/copyto.jl Outdated Show resolved Hide resolved

mtfishman reviewed Nov 15, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/mul.jl Outdated Show resolved Hide resolved

kmp5VT and others added 5 commits November 15, 2023 12:01

Update NDTensors/ext/NDTensorsCUDAExt/NDTensorsCUDAExt.jl

5c019bc

Co-authored-by: Matt Fishman <[email protected]>

Update NDTensors/ext/NDTensorsCUDAExt/mul.jl

c76a6bb

Co-authored-by: Matt Fishman <[email protected]>

Update NDTensors/ext/NDTensorsCUDAExt/copyto.jl

2650f60

Co-authored-by: Matt Fishman <[email protected]>

Aphabetize and add compat

85aee99

Merge branch 'kmp5/debug/scalar_indexing' of github.com:kmp5VT/ITenso…

21aa398

…rs.jl into kmp5/debug/scalar_indexing

mtfishman merged commit 61caa1b into ITensor:main Nov 15, 2023
8 checks passed

kmp5VT deleted the kmp5/debug/scalar_indexing branch November 15, 2023 22:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NDTensorsCUDAExt] Remove and test for scalar indexing #1245

[NDTensorsCUDAExt] Remove and test for scalar indexing #1245

kmp5VT commented Nov 9, 2023

mtfishman commented Nov 9, 2023

codecov-commenter commented Nov 9, 2023 •

edited

Loading

kmp5VT commented Nov 10, 2023

mtfishman commented Nov 10, 2023

kmp5VT commented Nov 10, 2023

mtfishman commented Nov 10, 2023

mtfishman commented Nov 10, 2023

mtfishman commented Nov 10, 2023

mtfishman commented Nov 15, 2023

kmp5VT commented Nov 15, 2023

[NDTensorsCUDAExt] Remove and test for scalar indexing #1245

[NDTensorsCUDAExt] Remove and test for scalar indexing #1245

Conversation

kmp5VT commented Nov 9, 2023

Description

mtfishman commented Nov 9, 2023

codecov-commenter commented Nov 9, 2023 • edited Loading

Codecov Report

kmp5VT commented Nov 10, 2023

mtfishman commented Nov 10, 2023

kmp5VT commented Nov 10, 2023

mtfishman commented Nov 10, 2023

mtfishman commented Nov 10, 2023

mtfishman commented Nov 10, 2023

mtfishman commented Nov 15, 2023

kmp5VT commented Nov 15, 2023

codecov-commenter commented Nov 9, 2023 •

edited

Loading