Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NDTensorsCUDAExt] Remove and test for scalar indexing #1245

Merged
merged 62 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
7576c3b
When calling `mul!` with a `Transpose(CuArray)` generic CPU mul is ca…
kmp5VT Nov 9, 2023
efcde0c
format
kmp5VT Nov 9, 2023
ebb1713
Replace undef with rand
kmp5VT Nov 10, 2023
077527b
Implement Adjoint and Transpose expose mul! functions for cuda and Metal
kmp5VT Nov 10, 2023
65ea45a
Add transpose and adjoint tests to Unwrap
kmp5VT Nov 10, 2023
0b6b07e
Merge branch 'main' into kmp5/debug/scalar_indexing
kmp5VT Nov 10, 2023
3ad5551
Changes per matts comments
kmp5VT Nov 10, 2023
256276f
Merge branch 'kmp5/debug/scalar_indexing' of github.com:kmp5VT/ITenso…
kmp5VT Nov 10, 2023
dd31b94
Fix typeo and mark mul! tests as broken to keep record in case they a…
kmp5VT Nov 10, 2023
d7893e9
add svd to using linearalgebra
kmp5VT Nov 12, 2023
b8e732b
Add GPUArraysCore to NDTensors
kmp5VT Nov 13, 2023
a567d28
Start removing scalar and update combiner test
kmp5VT Nov 13, 2023
5d103cf
Move using GPUArraysCore to devicelist.jl
kmp5VT Nov 13, 2023
77411a9
Allowscalar in dense and reorder to put allowscalar together in code …
kmp5VT Nov 13, 2023
8f4505a
Use GPUArraysCore. because things are failing without it
kmp5VT Nov 13, 2023
21aa74e
only run `test_broken` if not using CPU
kmp5VT Nov 13, 2023
0deb8c0
Add a few definitions in NDTensorsCUDAExt
kmp5VT Nov 13, 2023
686e3d6
Add allowscalar to blocksparse tests
kmp5VT Nov 13, 2023
1c54aa1
Add some fixes for scalar indexing block dims
kmp5VT Nov 13, 2023
7fa60f6
No need to expose here since they are Tensors not arrays
kmp5VT Nov 13, 2023
9667482
Add allowscalar to unwrap test
kmp5VT Nov 13, 2023
1897e2e
Fix linearalgebra with allowscalar
kmp5VT Nov 13, 2023
6eec341
format
kmp5VT Nov 13, 2023
c40a76f
Add allowscalar theres an issue in zygote
kmp5VT Nov 13, 2023
53fac06
return unexposed(dest)
kmp5VT Nov 13, 2023
a55745b
Remove unecessary GPUArraysCore and add @allowscalar
kmp5VT Nov 13, 2023
6f7c9b0
remove extra parenthesis
kmp5VT Nov 13, 2023
f3f5e18
Remove unecessary code
kmp5VT Nov 13, 2023
6afb6db
Remove using GPUArraysCore from top level test
kmp5VT Nov 13, 2023
2c01b12
need `Base.setindex!` missing the base portion
kmp5VT Nov 13, 2023
23256a9
Remove CPU check and fix expose setindex call
kmp5VT Nov 13, 2023
6f2b593
Remove CUDA.allowscalar()
kmp5VT Nov 13, 2023
24da742
Update the Dense function to be no-copy
kmp5VT Nov 13, 2023
3ee4f70
Allowscalar checks and silence inner with an H
kmp5VT Nov 13, 2023
afd718c
format
kmp5VT Nov 13, 2023
b61d17c
Use array over Array
kmp5VT Nov 13, 2023
3b87a47
Move GPUArraysCore: @allowscalar to top of files which use it
kmp5VT Nov 13, 2023
a74dc1c
Fix CuArray scalar index issue in copyto! in the same way as metal
kmp5VT Nov 13, 2023
657ef3e
Revert and remove H
kmp5VT Nov 13, 2023
df4dde9
Todo message
kmp5VT Nov 13, 2023
0299eda
format
kmp5VT Nov 13, 2023
1412eb6
Move base.copy version to `copyto.jl`
kmp5VT Nov 14, 2023
b1ace97
Revert changes to Dense constructor
kmp5VT Nov 14, 2023
76a5c8f
Add a mul! function to address CUDA problem
kmp5VT Nov 14, 2023
44696d3
Copy parent then transpose in mul! call
kmp5VT Nov 14, 2023
3956a58
format
kmp5VT Nov 14, 2023
3aaabec
Merge branch 'main' into kmp5/debug/scalar_indexing
kmp5VT Nov 14, 2023
9d020ce
Add mul!! test to `diag.jl` to test issue in `CUDA.jl`
kmp5VT Nov 14, 2023
b1ec109
format
kmp5VT Nov 14, 2023
da6aca0
Add tests for complicated wrapper (breaks in CUDA.jl)
kmp5VT Nov 15, 2023
78d4d06
Consistently annotate loops as @allowscalar
kmp5VT Nov 15, 2023
b40f227
test mul! instead of mul!!
kmp5VT Nov 15, 2023
ae3fb94
format
kmp5VT Nov 15, 2023
180da60
remove outdated todo comment
kmp5VT Nov 15, 2023
2e05392
change .+ to +, comment out code and add todo
kmp5VT Nov 15, 2023
7e99a57
Don't comment out testset
kmp5VT Nov 15, 2023
4d330e5
add mul!! test as a second verification
kmp5VT Nov 15, 2023
5c019bc
Update NDTensors/ext/NDTensorsCUDAExt/NDTensorsCUDAExt.jl
kmp5VT Nov 15, 2023
c76a6bb
Update NDTensors/ext/NDTensorsCUDAExt/mul.jl
kmp5VT Nov 15, 2023
2650f60
Update NDTensors/ext/NDTensorsCUDAExt/copyto.jl
kmp5VT Nov 15, 2023
85aee99
Aphabetize and add compat
kmp5VT Nov 15, 2023
21aa398
Merge branch 'kmp5/debug/scalar_indexing' of github.com:kmp5VT/ITenso…
kmp5VT Nov 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NDTensors/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Strided = "5e0ebb24-38b0-5f93-81fe-25c709ecae67"
StridedViews = "4db3bf67-4bd7-4b4e-b153-31dc3fb37143"
TimerOutputs = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
TupleTools = "9d95972d-f1c8-5527-a6e0-b4b365fa01f6"
GPUArraysCore="46192b85-c4d5-4398-a991-12ede77f4527"
kmp5VT marked this conversation as resolved.
Show resolved Hide resolved

[weakdeps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
Expand Down
7 changes: 6 additions & 1 deletion NDTensors/ext/NDTensorsCUDAExt/NDTensorsCUDAExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,21 @@ using NDTensors.SetParameters
using NDTensors.Unwrap
using Adapt
using Functors
using LinearAlgebra
using LinearAlgebra: LinearAlgebra, Adjoint, Transpose, mul!, svd
using CUDA
using CUDA.CUBLAS
using CUDA.CUSOLVER

## TODO I added copyto and permutedims which match the functions in
## NDTensorsMetalExt because I found similar issues in CUDA
kmp5VT marked this conversation as resolved.
Show resolved Hide resolved
include("imports.jl")
include("default_kwargs.jl")
include("copyto.jl")
include("set_types.jl")
include("iscu.jl")
include("adapt.jl")
include("indexing.jl")
include("linearalgebra.jl")
include("mul.jl")
include("permutedims.jl")
end
28 changes: 28 additions & 0 deletions NDTensors/ext/NDTensorsCUDAExt/copyto.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## IT looks like CuArray suffers from the same issues as MtlArray.
## To fix this subarray copyto problem I copied same code from MetalExt
## This means we can probably write a generic implmenetation for GPUArrays
kmp5VT marked this conversation as resolved.
Show resolved Hide resolved
function Base.copy(src::Exposed{<:CuArray,<:Base.ReshapedArray})
return reshape(copy(parent(src)), size(unexpose(src)))
end

function Base.copy(
src::Exposed{
<:CuArray,<:SubArray{<:Any,<:Any,<:Base.ReshapedArray{<:Any,<:Any,<:Adjoint}}
},
)
return copy(@view copy(expose(parent(src)))[parentindices(unexpose(src))...])
end

# Catches a bug in `copyto!` in CUDA backend.
function Base.copyto!(dest::Exposed{<:CuArray}, src::Exposed{<:CuArray,<:SubArray})
copyto!(dest, expose(copy(src)))
return unexpose(dest)
end

# Catches a bug in `copyto!` in CUDA backend.
function Base.copyto!(
dest::Exposed{<:CuArray}, src::Exposed{<:CuArray,<:Base.ReshapedArray}
)
copyto!(dest, expose(parent(src)))
return unexpose(dest)
end
2 changes: 0 additions & 2 deletions NDTensors/ext/NDTensorsCUDAExt/imports.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,3 @@ import NDTensors: cu, set_ndims, set_eltype, set_eltype_if_unspecified, similart
import NDTensors:
ContractionProperties, _contract!, GemmBackend, auto_select_backend, _gemm!, iscu
import NDTensors.SetParameters: nparameters, get_parameter, set_parameter, default_parameter

import .CUDA: CuArrayAdaptor
6 changes: 1 addition & 5 deletions NDTensors/ext/NDTensorsCUDAExt/indexing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ function Base.getindex(E::Exposed{<:CuArray})
return CUDA.@allowscalar unexpose(E)[]
end

function setindex!(E::Exposed{<:CuArray}, x::Number)
function Base.setindex!(E::Exposed{<:CuArray}, x::Number)
CUDA.@allowscalar unexpose(E)[] = x
return unexpose(E)
end
Expand All @@ -11,10 +11,6 @@ function Base.getindex(E::Exposed{<:CuArray,<:Adjoint}, i, j)
return (expose(parent(E))[j, i])'
end

function Base.copy(E::Exposed{<:CuArray,<:Base.ReshapedArray})
return reshape(copy(expose(parent(E))), size(unexpose(E)))
end

Base.any(f, E::Exposed{<:CuArray,<:NDTensors.Tensor}) = any(f, data(unexpose(E)))

function Base.print_array(io::IO, E::Exposed{<:CuArray})
Expand Down
45 changes: 45 additions & 0 deletions NDTensors/ext/NDTensorsCUDAExt/mul.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# This was calling generic matrix multiplication.
# TODO: Raise an issue with `CUDA.jl`.
function LinearAlgebra.mul!(
CM::Exposed{<:CuArray,<:LinearAlgebra.Transpose},
AM::Exposed{<:CuArray},
BM::Exposed{<:CuArray},
α,
β,
)
mul!(transpose(CM), transpose(BM), transpose(AM), α, β)
return unexpose(CM)
end

# This was calling generic matrix multiplication.
# TODO: Raise an issue with `CUDA.jl`.
function LinearAlgebra.mul!(
CM::Exposed{<:CuArray,<:LinearAlgebra.Adjoint},
AM::Exposed{<:CuArray},
BM::Exposed{<:CuArray},
α,
β,
)
mul!(CM', BM', AM', α, β)
return unexpose(CM)
end

kmp5VT marked this conversation as resolved.
Show resolved Hide resolved
## TODO I wasn't sure the best route to go here, if there is a better route than
## copy please let me know!
kmp5VT marked this conversation as resolved.
Show resolved Hide resolved
kmp5VT marked this conversation as resolved.
Show resolved Hide resolved
## Fix issue in CUDA.jl where it cannot distinguish Transpose{Reshape{Adjoint{CuArray}}}
## as a CuArray and calls generic matmul
function LinearAlgebra.mul!(
CM::Exposed{<:CuArray},
AM::Exposed{<:CuArray},
BM::Exposed{
<:CuArray,
<:LinearAlgebra.Transpose{
<:Any,<:Base.ReshapedArray{<:Any,<:Any,<:LinearAlgebra.Adjoint}
},
},
α,
β,
)
mul!(CM, AM, expose(transpose(copy(expose(parent(BM))))), α, β)
return unexpose(CM)
end
7 changes: 7 additions & 0 deletions NDTensors/ext/NDTensorsCUDAExt/permutedims.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
function Base.permutedims!(
Edest::Exposed{<:CuArray,<:Base.ReshapedArray}, Esrc::Exposed{<:CuArray}, perm
)
Aperm = permutedims(Esrc, perm)
copyto!(expose(parent(Edest)), expose(Aperm))
return unexpose(Edest)
end
6 changes: 4 additions & 2 deletions NDTensors/ext/NDTensorsMetalExt/copyto.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@ end

# Catches a bug in `copyto!` in Metal backend.
function Base.copyto!(dest::Exposed{<:MtlArray}, src::Exposed{<:MtlArray,<:SubArray})
return copyto!(dest, expose(copy(src)))
copyto!(dest, expose(copy(src)))
return unexpose(dest)
end

# Catches a bug in `copyto!` in Metal backend.
function Base.copyto!(
dest::Exposed{<:MtlArray}, src::Exposed{<:MtlArray,<:Base.ReshapedArray}
)
return copyto!(dest, expose(parent(src)))
copyto!(dest, expose(parent(src)))
return unexpose(dest)
end
9 changes: 9 additions & 0 deletions NDTensors/ext/NDTensorsMetalExt/mul.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,12 @@ function LinearAlgebra.mul!(
mul!(transpose(CM), transpose(BM), transpose(AM), α, β)
return unexpose(CM)
end

# This was calling generic matrix multiplication.
# TODO: Raise an issue with `Metal.jl`.
function LinearAlgebra.mul!(
CM::Exposed{<:MtlArray,<:Adjoint}, AM::Exposed{<:MtlArray}, BM::Exposed{<:MtlArray}, α, β
)
mul!(CM', BM', AM', α, β)
return unexpose(CM)
end
6 changes: 3 additions & 3 deletions NDTensors/ext/examples/NDTensorCUDA.jl
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@ function main()
#Currently this code fails with CUDA.allowscalar(false)
# Because of outer calling the _gemm! function which calls a
# generic implementation
grad = gradient(f, cA, cB, cC, cD)
@test NDTensors.cpu(cB * cC * cD) ≈ NDTensors.cpu(grad[1])
@test (cB * cC * cD) ≈ grad[1]
@allowscalar grad = gradient(f, cA, cB, cC, cD)
@allowscalar @test NDTensors.cpu(cB * cC * cD) ≈ NDTensors.cpu(grad[1])
@allowscalar @test (cB * cC * cD) ≈ grad[1]
# Create a tuple of indices
decomp = (
dim(NDTensors.ind(grad[1], 1)),
Expand Down
1 change: 1 addition & 0 deletions NDTensors/src/NDTensors.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ using SplitApplyCombine
using Strided
using TimerOutputs
using TupleTools
using GPUArraysCore

include("SetParameters/src/SetParameters.jl")
using .SetParameters
Expand Down
2 changes: 2 additions & 0 deletions NDTensors/src/Unwrap/src/functions/abstractarray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ parent(E::Exposed) = parent(unexpose(E))

transpose(E::Exposed) = transpose(unexpose(E))

adjoint(E::Exposed) = adjoint(unexpose(E))

cpu(E::Exposed) = cpu(unexpose(E))

getindex(E::Exposed) = unexpose(E)[]
Expand Down
1 change: 1 addition & 0 deletions NDTensors/src/Unwrap/src/import.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import Base:
adjoint,
permutedims,
permutedims!,
copy,
Expand Down
57 changes: 51 additions & 6 deletions NDTensors/src/Unwrap/test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ using Test
using NDTensors.Unwrap
using NDTensors
using LinearAlgebra
using GPUArraysCore: @allowscalar

include("../../../test/device_list.jl")
@testset "Testing Unwrap $dev, $elt" for dev in devices_list(ARGS),
Expand All @@ -24,8 +25,8 @@ include("../../../test/device_list.jl")
@test parent(Et) == v
@test parent(Ea) == v
@test transpose(E) == vt
@test cpu(E) == v
@test cpu(Et) == vt
@test cpu(E) == cpu(v)
@test cpu(Et) == cpu(vt)

m = reshape(v, (5, 2))
mt = transpose(m)
Expand Down Expand Up @@ -125,17 +126,61 @@ include("../../../test/device_list.jl")
y = dev(randn(elt, 16))
x = reshape(dev(randn(elt, 4, 4))', 16)
copyto!(expose(y), expose(x))
@test y == x
@test copy(x) == x
@allowscalar begin
@test y == x
@test copy(x) == x
end

y = dev(randn(elt, 8))
x = @view reshape(dev(randn(elt, 8, 8))', 64)[1:8]
copyto!(expose(y), expose(x))
@test y == x
@test copy(x) == x
@allowscalar begin
@test y == x
@test copy(x) == x
end

y = Base.ReshapedArray(dev(randn(elt, 16)), (4, 4), ())
x = dev(randn(elt, 4, 4))
permutedims!(expose(y), expose(x), (2, 1))
@test NDTensors.cpu(y) == transpose(NDTensors.cpu(x))

##########################################
### Testing an issue with CUDA&Metal transpose/adjoint mul
A = dev(randn(elt, (3, 2)))
B = dev(randn(elt, (3, 4)))
C = dev(randn(elt, (4, 2)))
Cp = copy(C)

## This fails with scalar indexing
if dev != NDTensors.cpu
@test_broken mul!(transpose(C), transpose(A), B, true, false)
end
mul!(C, transpose(B), A, true, false)
mul!(expose(transpose(Cp)), expose(transpose(A)), expose(B), true, false)
@test C ≈ Cp
Cp = zero(C)
## Try calling mul!! with transposes to verify that code works
Cpt = NDTensors.mul!!(transpose(Cp), transpose(A), B, true, false)
@test transpose(Cpt) ≈ C

Cp = zero(C)
## This fails with scalar indexing
if dev != NDTensors.cpu
@test_broken mul!(C', A', B, true, false)
end
mul!(C, B', A, true, false)
mul!(expose(Cp'), expose(A'), expose(B), true, false)
@test C ≈ Cp
Cp = zero(C)
Cpt = NDTensors.mul!!(Cp', A', B, true, false)
@test Cpt' ≈ C

##################################
### Add test for transpose(reshape(adjoint )) failure in CUDA
A = dev(transpose(reshape(randn(elt, 2, 12)', (12, 2))))
B = dev(randn(elt, 2, 2))
C = dev(zeros(elt, (2, 12)))
kmp5VT marked this conversation as resolved.
Show resolved Hide resolved
NDTensors.mul!!(C, B, A, true, false)
Cp = B * copy(A)
@test @allowscalar C ≈ Cp
kmp5VT marked this conversation as resolved.
Show resolved Hide resolved
end
8 changes: 6 additions & 2 deletions NDTensors/src/blocksparse/blocksparsetensor.jl
Original file line number Diff line number Diff line change
Expand Up @@ -335,11 +335,15 @@ view(T::BlockSparseTensor, b::Block) = blockview(T, b)
# convert to Dense
function dense(T::TensorT) where {TensorT<:BlockSparseTensor}
R = zeros(dense(TensorT), inds(T))
## Here this failed with scalar indexing (R[blockindices] = blockview)
## We can fix this by using copyto the arrays
r = array(R)
for block in keys(blockoffsets(T))
# TODO: make sure this assignment is efficient
R[blockindices(T, block)] = blockview(T, block)
rview = @view r[blockindices(T, block)]
copyto!(expose(rview), expose(array(blockview(T, block))))
end
return R
return tensor(Dense(r), inds(T))
end

#
Expand Down
2 changes: 1 addition & 1 deletion NDTensors/src/blocksparse/linearalgebra.jl
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ function svd(
if (sV * sVP) == -1
Vb *= -1
end
copyto!(expose(blockview(V, blockV)), expose(Vb))
copyto!(blockview(V, blockV), Vb)
end
return U, S, V, Spectrum(d, truncerr)
end
Expand Down
1 change: 1 addition & 0 deletions NDTensors/test/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f"
TBLIS = "48530278-0828-4a49-9772-0f3830dfa1e9"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
GPUArraysCore="46192b85-c4d5-4398-a991-12ede77f4527"

[extras]
Metal = "dde4c033-4e86-420c-a63e-0dd931031962"
Loading
Loading