-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NDTensors] Fix scalar indexing issue for Diag broadcast on GPU #1497
Conversation
Thanks for looking into this @kmp5VT. It seems better to me to just change the definition of: function permutedims!(
R::DiagTensor{<:Number,N},
T::DiagTensor{<:Number,N},
perm::NTuple{N,Int},
f::Function=(r, t) -> t,
) where {N}
# ...
end to the one you've defined for GPU (or something similar, see my next comment) so we don't have to support two implementations of that function. Also I see for loops/scalar indexing in a number of other functions: function diag(tensor::DiagTensor)
# ...
end
# I see this one is using some ad-hoc dispatch
# for GPU but it seems like we should either use
# `expose` or come up with a better code pattern,
# say use a slicing operation, that is generic
# to CPU and GPU.
function dense(::Type{<:Array}, T::DiagTensor)
# ...
end
function permutedims!(
R::DenseTensor{ElR,N}, T::DiagTensor{ElT,N}, perm::NTuple{N,Int}, f::Function=(r, t) -> t
) where {ElR,ElT,N}
# ...
end Can you look into those as well? |
One recommendation I have would be to look into the code patterns I came up with in NDTensors.DiagonalArrays for more conveniently working with diagonal values. For example: https://github.com/ITensor/ITensors.jl/blob/v0.6.11/NDTensors/src/lib/DiagonalArrays/src/diaginterface/diaginterface.jl. As a demonstration: using Compat: allequal
function diaglength(a::AbstractArray)
return minimum(size(a))
end
diaglength(a::AbstractArray{<:Any,0}) = 1
function diagstride(a::AbstractArray)
s = 1
p = 1
for i in 1:(ndims(a) - 1)
p *= size(a, i)
s += p
end
return s
end
function diagindices(a::AbstractArray)
maxdiag = LinearIndices(a)[CartesianIndex(ntuple(Returns(diaglength(a)), ndims(a)))]
return 1:diagstride(a):maxdiag
end
function diagindices(a::AbstractArray{<:Any,0})
return Base.OneTo(1)
end
function diagview(a::AbstractArray)
return @view a[diagindices(a)]
end
using LinearAlgebra: Diagonal
function diagview(a::Diagonal)
return a.diag
end
x = Diagonal(randn(4))
y = randn(4, 4)
using Metal: mtl
x = mtl(x)
y = mtl(y)
diagview(y) .= diagview(x) That appears to avoid scalar indexing but I'm not sure if it is actually fast on GPU. But I think if |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1497 +/- ##
===========================================
- Coverage 78.05% 60.84% -17.21%
===========================================
Files 148 148
Lines 9679 9672 -7
===========================================
- Hits 7555 5885 -1670
- Misses 2124 3787 +1663 ☔ View full report in Codecov by Sentry. |
@kmp5VT I think it could simplify the code to define overloads of |
I think it is a good idea to remake the function like you are suggesting. One previous point about these functions is that function dense(T::DiagTensor, ArrayT::Type{<:AbstractArray})
R = adapt(ArrayT, zeros(dense(typeof(T)), inds(T))
diagview(R) .= diagview(T)
return R
end Also with code as it is now we convert |
What if in those places in the code where we were using that more complicated version of |
@mtfishman Actually the code you wrote works fine without using adapt, julia> using NDTensors, Metal
julia> A = tensor(Diag(3,3), (3,3))
julia> dense(A)
Dim 1: 3
Dim 2: 3
Dense{Int64, Vector{Int64}}
3×3
3 0 0
0 3 0
0 0 3
jullia> dense(mtl(A))
Dim 1: 3
Dim 2: 3
Dense{Int64, MtlVector{Int64, Private}}
3×3
3 0 0
0 3 0
0 0 3 |
@kmp5VT could you try adding |
Besides the last two comments I left, this looks good to me. I think with this change (and also JuliaGPU/Metal.jl#374), we are probably getting very close to having every NDTensors/ITensors operation working on GPU, which is a great step to get to! I'm sure a few lingering issues will come up, and there will be continued work improving performance (particularly for block sparse operations), but it definitely seems like the support for GPU operations is quite good now (at least for our supported backends CUDA, Metal, and ROCm), assuming the lack of user reports of issues is a good indicator of that. Probably the biggest missing piece (as summarized in https://itensor.github.io/ITensors.jl/dev/RunningOnGPUs.html) is support for Intel GPUs through oneAPI.jl, which presumably would not be difficult to add and could just follow the design used in #1325 to add support for AMD GPUs, in fact probably that PR could just be copied and translated to oneAPI.jl, at least as a good start. |
Not sure why the |
Can you add tests for |
Description
Fixes #1482.
Checklist:
NDTensors.map_diag!
andNDTensors.map_diag
functions.NDTensorsGPUArraysCoreExt/diag.jl
toNDTensorsGPUArraysCoreExt/blocksparsetensor.jl
.Diag
andDiagBlockSparse