[NDTensors] Fix scalar indexing issue for Diag broadcast on GPU #1497

kmp5VT · 2024-06-13T11:42:35Z

Description

Checklist:

Test for new NDTensors.map_diag! and NDTensors.map_diag functions.
Fix Jenkins CI (see comment below for a suggestion).
Rename NDTensorsGPUArraysCoreExt/diag.jl to NDTensorsGPUArraysCoreExt/blocksparsetensor.jl.
The code can correctly output broadcast functions for Diag datatype using GPU backends
Create unittests in Diag and DiagBlockSparse

NDTensors/ext/NDTensorsGPUArraysCoreExt/permutedims.jl

mtfishman · 2024-06-13T12:05:26Z

Thanks for looking into this @kmp5VT. It seems better to me to just change the definition of:

function permutedims!(
  R::DiagTensor{<:Number,N},
  T::DiagTensor{<:Number,N},
  perm::NTuple{N,Int},
  f::Function=(r, t) -> t,
) where {N}
  # ...
end

to the one you've defined for GPU (or something similar, see my next comment) so we don't have to support two implementations of that function.

Also I see for loops/scalar indexing in a number of other functions:

function diag(tensor::DiagTensor)
  # ...
end

# I see this one is using some ad-hoc dispatch
# for GPU but it seems like we should either use
# `expose` or come up with a better code pattern,
# say use a slicing operation, that is generic
# to CPU and GPU.
function dense(::Type{<:Array}, T::DiagTensor)
  # ...
end

function permutedims!(
  R::DenseTensor{ElR,N}, T::DiagTensor{ElT,N}, perm::NTuple{N,Int}, f::Function=(r, t) -> t
) where {ElR,ElT,N}
  # ...
end

Can you look into those as well?

mtfishman · 2024-06-13T12:18:53Z

One recommendation I have would be to look into the code patterns I came up with in NDTensors.DiagonalArrays for more conveniently working with diagonal values. For example: https://github.com/ITensor/ITensors.jl/blob/v0.6.11/NDTensors/src/lib/DiagonalArrays/src/diaginterface/diaginterface.jl.

As a demonstration:

using Compat: allequal

function diaglength(a::AbstractArray)
  return minimum(size(a))
end
diaglength(a::AbstractArray{<:Any,0}) = 1

function diagstride(a::AbstractArray)
  s = 1
  p = 1
  for i in 1:(ndims(a) - 1)
    p *= size(a, i)
    s += p
  end
  return s
end

function diagindices(a::AbstractArray)
  maxdiag = LinearIndices(a)[CartesianIndex(ntuple(Returns(diaglength(a)), ndims(a)))]
  return 1:diagstride(a):maxdiag
end
function diagindices(a::AbstractArray{<:Any,0})
  return Base.OneTo(1)
end

function diagview(a::AbstractArray)
  return @view a[diagindices(a)]
end

using LinearAlgebra: Diagonal
function diagview(a::Diagonal)
  return a.diag
end

x = Diagonal(randn(4))
y = randn(4, 4)

using Metal: mtl
x = mtl(x)
y = mtl(y)

diagview(y) .= diagview(x)

That appears to avoid scalar indexing but I'm not sure if it is actually fast on GPU. But I think if diagview was defined in a similar way for Tensor objects, such as DiagTensor, DenseTensor, etc. we could use a code pattern like that to implement most of the functions that currently use scalar indexing/explicit for loops that I listed above in a simple way, just in terms of broadcasting over diagview.

codecov-commenter · 2024-06-13T12:28:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.84%. Comparing base (82cfd76) to head (cb8d766).
Report is 9 commits behind head on main.

❗ Current head cb8d766 differs from pull request most recent head 51323fc

Please upload reports for the commit 51323fc to get more accurate results.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1497       +/-   ##
===========================================
- Coverage   78.05%   60.84%   -17.21%     
===========================================
  Files         148      148               
  Lines        9679     9672        -7     
===========================================
- Hits         7555     5885     -1670     
- Misses       2124     3787     +1663

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

NDTensors/test/test_diag.jl

mtfishman · 2024-06-13T22:30:46Z

@kmp5VT I think it could simplify the code to define overloads of DiagonalArrays.diagview for DenseTensor and DiagTensor.

NDTensors/src/diag/diagtensor.jl

NDTensors/src/linearalgebra/linearalgebra.jl

NDTensors/src/diag/diagtensor.jl

…h allowscalar

kmp5VT · 2024-06-19T16:25:02Z

Using the new functionality in this PR can we rewrite that as something like:
function dense(T::DiagTensor)
  R = zeros(dense(typeof(T)), inds(T))
  diagview(R) .= diagview(T)
  return R  
end
i.e. get rid of the dispatch on the unwrapped array type?

I think it is a good idea to remake the function like you are suggesting. One previous point about these functions is that UniformDiag storage types always convert to Array which could cause issues with GPU code. Right now we have that fixed in other places. But I think it would be a good idea to create a function like

function dense(T::DiagTensor, ArrayT::Type{<:AbstractArray})
  R = adapt(ArrayT, zeros(dense(typeof(T)), inds(T))
  diagview(R) .= diagview(T)
  return R
end

Also with code as it is now we convert NonuniformDiag storage types from CPU back to GPU with this line of code return adapt(unwrap_array_type(T), D_cpu) but this would fail if T::UniformDiag type because unwrap_arra_type(T) = Number

mtfishman · 2024-06-19T17:12:02Z

What if in those places in the code where we were using that more complicated version of dense, instead we called adapt explicitly, i.e. change calls like dense(array_type, T) to adapt(array_type, dense(T))?

kmp5VT · 2024-06-19T18:39:04Z

@mtfishman Actually the code you wrote works fine without using adapt,

julia> using NDTensors, Metal
julia> A = tensor(Diag(3,3), (3,3))
julia> dense(A)
Dim 1: 3
Dim 2: 3
Dense{Int64, Vector{Int64}}
 3×3
 3  0  0
 0  3  0
 0  0  3
jullia> dense(mtl(A))
Dim 1: 3
Dim 2: 3
Dense{Int64, MtlVector{Int64, Private}}
 3×3
 3  0  0
 0  3  0
 0  0  3

mtfishman · 2024-06-19T21:51:27Z

@kmp5VT could you try adding Pkg.Registry.update(); Pkg.update(); to the Jenkins script, like I did in ITensor/ITensorGPU.jl#9? That may solve the test failures, since it will force Julia to update the registry, I think for some reason it is just seeing an old registry and therefore only seeing old package versions, so it doesn't know that it can upgrade to BlockArrays v1.1.

NDTensors/ext/NDTensorsGPUArraysCoreExt/diag.jl

mtfishman · 2024-06-19T22:03:26Z

Besides the last two comments I left, this looks good to me.

I think with this change (and also JuliaGPU/Metal.jl#374), we are probably getting very close to having every NDTensors/ITensors operation working on GPU, which is a great step to get to! I'm sure a few lingering issues will come up, and there will be continued work improving performance (particularly for block sparse operations), but it definitely seems like the support for GPU operations is quite good now (at least for our supported backends CUDA, Metal, and ROCm), assuming the lack of user reports of issues is a good indicator of that.

Probably the biggest missing piece (as summarized in https://itensor.github.io/ITensors.jl/dev/RunningOnGPUs.html) is support for Intel GPUs through oneAPI.jl, which presumably would not be difficult to add and could just follow the design used in #1325 to add support for AMD GPUs, in fact probably that PR could just be copied and translated to oneAPI.jl, at least as a good start.

mtfishman · 2024-06-20T14:38:42Z

Not sure why the ITensorGaussianMPS downstream test timed out (maybe just a random failure in Github Actions) but it looks like that change fixed the Jenkins tests. EDIT: I reran it, those are passing now.

mtfishman · 2024-06-20T15:06:51Z

Can you add tests for NDTensors.map_diag! and NDTensors.map_diag?

NDTensors/test/test_blocksparse.jl

NDTensors/test/test_dense.jl

NDTensors/test/test_diag.jl

kmp5VT added 2 commits June 13, 2024 07:38

Working on GPU diag problem

3e77951

format

721d0f0

kmp5VT commented Jun 13, 2024

View reviewed changes

NDTensors/ext/NDTensorsGPUArraysCoreExt/permutedims.jl Outdated Show resolved Hide resolved

Use data over array

9e222ed

mtfishman changed the title ~~[NDTensorsGPUArraysCoreExt] Address issue 1482, scalar indexing for Diag Broadcast on GPU~~ [NDTensorsGPUArraysCoreExt] Fix scalar indexing issue for Diag Broadcast on GPU Jun 13, 2024

kmp5VT added 4 commits June 13, 2024 07:48

remove adapt

7e95335

Make expose permutedims for diagtensor to fix cpu error

aef7db6

format

f8a80c9

Merge branch 'main' into kmp5/debug/issue_1482

c3df57f

Merge branch 'main' into kmp5/debug/issue_1482

b61b4de

mtfishman changed the title ~~[NDTensorsGPUArraysCoreExt] Fix scalar indexing issue for Diag Broadcast on GPU~~ [NDTensors] Fix scalar indexing issue for Diag Broadcast on GPU Jun 13, 2024

mtfishman changed the title ~~[NDTensors] Fix scalar indexing issue for Diag Broadcast on GPU~~ [NDTensors] Fix scalar indexing issue for Diag broadcast on GPU Jun 13, 2024

kmp5VT added 3 commits June 13, 2024 16:36

Remove permutedims function

f96f322

Try to make GPUs more supported by Diag

7807882

Merge branch 'main' into kmp5/debug/issue_1482

8f01962

kmp5VT commented Jun 13, 2024

View reviewed changes

NDTensors/test/test_diag.jl Outdated Show resolved Hide resolved

Add a comment with a link to the bug in Metal.jl

6ef078f

kmp5VT added 3 commits June 13, 2024 18:52

Remove unused line (A request from Miles)

c0ad602

Fix diag function for GPU code

b3892df

Make diagview functions for Tensor types

cc24e62

kmp5VT commented Jun 13, 2024

View reviewed changes

NDTensors/src/diag/diagtensor.jl Show resolved Hide resolved

kmp5VT commented Jun 13, 2024

View reviewed changes

NDTensors/src/linearalgebra/linearalgebra.jl Show resolved Hide resolved

Remove unecessary function

2af30bf

mtfishman reviewed Jun 13, 2024

View reviewed changes

NDTensors/src/diag/diagtensor.jl Outdated Show resolved Hide resolved

mtfishman reviewed Jun 13, 2024

View reviewed changes

NDTensors/src/diag/diagtensor.jl Outdated Show resolved Hide resolved

mtfishman reviewed Jun 13, 2024

View reviewed changes

NDTensors/src/diag/diagtensor.jl Outdated Show resolved Hide resolved

kmp5VT added 2 commits June 19, 2024 11:32

Don't use NDTensors map_diag yet

6ed349a

make map_diag expose based and create GPU version for blocksparse wit…

5d04949

…h allowscalar

format

60f2abc

remove expose

9433798

kmp5VT added 2 commits June 19, 2024 14:41

simplify dense definition for DiagTensor

3df285e

remove unused code

8e49288

mtfishman reviewed Jun 19, 2024

View reviewed changes

NDTensors/ext/NDTensorsGPUArraysCoreExt/diag.jl Outdated Show resolved Hide resolved

kmp5VT added 3 commits June 20, 2024 09:24

Rename diag to blocksparsetensor.jl

caba6a4

Try forcing Pkg to update the registry to fix ci issue

ea52a82

Use approx because of numerical noise

a48ddb0

mtfishman reviewed Jun 20, 2024

View reviewed changes

NDTensors/test/test_blocksparse.jl Outdated Show resolved Hide resolved

kmp5VT added 4 commits June 20, 2024 16:33

Merge branch 'main' into kmp5/debug/issue_1482

b95c01e

Add map_diag tests

f5c2d39

format

d0dcb4d

remove w

ae19547

mtfishman reviewed Jun 20, 2024

View reviewed changes

NDTensors/test/test_blocksparse.jl Outdated Show resolved Hide resolved

Format

b2b0b19

mtfishman reviewed Jun 20, 2024

View reviewed changes

NDTensors/test/test_dense.jl Outdated Show resolved Hide resolved

Format

255f66e

mtfishman reviewed Jun 20, 2024

View reviewed changes

NDTensors/test/test_diag.jl Outdated Show resolved Hide resolved

Format

ab0728f

mtfishman merged commit a1e7ec5 into ITensor:main Jun 21, 2024
15 checks passed

mtfishman mentioned this pull request Jun 23, 2024

[NDTensorsGPUArraysCoreExt] Fix nonuniform Diag-Dense contractions on GPU #1511

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NDTensors] Fix scalar indexing issue for Diag broadcast on GPU #1497

[NDTensors] Fix scalar indexing issue for Diag broadcast on GPU #1497

kmp5VT commented Jun 13, 2024 •

edited by mtfishman

Loading

mtfishman commented Jun 13, 2024 •

edited

Loading

mtfishman commented Jun 13, 2024

codecov-commenter commented Jun 13, 2024 •

edited

Loading

mtfishman commented Jun 13, 2024

kmp5VT commented Jun 19, 2024 •

edited

Loading

mtfishman commented Jun 19, 2024

kmp5VT commented Jun 19, 2024 •

edited

Loading

mtfishman commented Jun 19, 2024 •

edited

Loading

mtfishman commented Jun 19, 2024 •

edited

Loading

mtfishman commented Jun 20, 2024 •

edited

Loading

mtfishman commented Jun 20, 2024 •

edited

Loading

[NDTensors] Fix scalar indexing issue for Diag broadcast on GPU #1497

[NDTensors] Fix scalar indexing issue for Diag broadcast on GPU #1497

Conversation

kmp5VT commented Jun 13, 2024 • edited by mtfishman Loading

Description

Checklist:

mtfishman commented Jun 13, 2024 • edited Loading

mtfishman commented Jun 13, 2024

codecov-commenter commented Jun 13, 2024 • edited Loading

Codecov Report

mtfishman commented Jun 13, 2024

kmp5VT commented Jun 19, 2024 • edited Loading

mtfishman commented Jun 19, 2024

kmp5VT commented Jun 19, 2024 • edited Loading

mtfishman commented Jun 19, 2024 • edited Loading

mtfishman commented Jun 19, 2024 • edited Loading

mtfishman commented Jun 20, 2024 • edited Loading

mtfishman commented Jun 20, 2024 • edited Loading

kmp5VT commented Jun 13, 2024 •

edited by mtfishman

Loading

mtfishman commented Jun 13, 2024 •

edited

Loading

codecov-commenter commented Jun 13, 2024 •

edited

Loading

kmp5VT commented Jun 19, 2024 •

edited

Loading

kmp5VT commented Jun 19, 2024 •

edited

Loading

mtfishman commented Jun 19, 2024 •

edited

Loading

mtfishman commented Jun 19, 2024 •

edited

Loading

mtfishman commented Jun 20, 2024 •

edited

Loading

mtfishman commented Jun 20, 2024 •

edited

Loading