Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NDTensors] [BUG] Many basic tensor operations fail with CUDA extension due to LoadError: Scalar indexing is disallowed. #1193

Closed
wbernoudy opened this issue Sep 13, 2023 · 9 comments · Fixed by #1220
Labels
bug Something isn't working NDTensors Requires changes to the NDTensors.jl library.

Comments

@wbernoudy
Copy link
Contributor

Description of bug

I am attempting to use the CUDA extension for NDTensors. I am finding that tensor contraction sometimes fails, along with other functions like permute, as well as the NDTensors.generic_randn call used in the example of the NDTensors CUDA extension (NDTensors/ext/examples/NDTensorCUDA.jl).

They all fail with ERROR: LoadError: Scalar indexing is disallowed. (or emit a similar warning in the REPL where scalar indexing is allowed).

Minimal code demonstrating the bug or unexpected behavior

Minimal runnable code

import CUDA
import ITensors
import NDTensors

i, j, k = ITensors.Index(3), ITensors.Index(5), ITensors.Index(2)
A, B = ITensors.randomITensor(i, j), ITensors.randomITensor(j, k)

# works when running script directly, but warns of scalar indexing in REPL
cuA, cuB = NDTensors.cu(A), NDTensors.cu(B)

contracted = cuA * cuB

@show NDTensors.cpu(contracted)  A * B

# without @CUDA.allowscalar, fails with `ERROR: LoadError: Scalar indexing is disallowed.`
permuted = ITensors.permute(cuA, (j, i))

Expected output or behavior

NDTensors.cu() should work without warning in REPL on ITensors.ITensor, and ITensors.permute should work without the scalar indexing error.

Actual output or behavior

Output of minimal runnable code

NDTensors.cpu(contracted)  A * B = true
ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
  [3] getindex
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:9 [inlined]
  [4] getindex
    @ ~/.julia/packages/StridedViews/MjjHH/src/stridedview.jl:97 [inlined]
  [5] macro expansion
    @ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:330 [inlined]
  [6] macro expansion
    @ ./simdloop.jl:77 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:329 [inlined]
  [8] _mapreduce_kernel!(f::Strided.CaptureArgs{NDTensors.var"#69#70", Tuple{Strided.Arg, Strided.Arg}}, op::Nothing, initop::Nothing, dims::Tuple{Int64, Int64}, blocks::Tuple{Int64, Int64}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, offsets::Tuple{Int64, Int64, Int64})
    @ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:229
  [9] _mapreduce_block!(f::Any, op::Any, initop::Any, dims::Tuple{Int64, Int64}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, offsets::Tuple{Int64, Int64, Int64}, costs::Tuple{Int64, Int64}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
    @ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:152
 [10] _mapreduce_order!(f::Any, op::Any, initop::Any, dims::Tuple{Int64, Int64}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
    @ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:138
 [11] _mapreduce_fuse!
    @ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:116 [inlined]
 [12] copyto!
    @ ~/.julia/packages/Strided/O8zfl/src/broadcast.jl:35 [inlined]
 [13] materialize!
    @ ./broadcast.jl:884 [inlined]
 [14] materialize!
    @ ./broadcast.jl:881 [inlined]
 [15] permutedims!(R::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, perm::Tuple{Int64, Int64}, f::NDTensors.var"#69#70")
    @ NDTensors ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:246
 [16] permutedims!!(R::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, perm::Tuple{Int64, Int64}, f::Function)
    @ NDTensors ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:188
 [17] permutedims!!
    @ ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:186 [inlined]
 [18] permutedims
    @ ~/.julia/packages/NDTensors/olKso/src/tensoralgebra/generic_tensor_operations.jl:5 [inlined]
 [19] permutedims
    @ ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:64 [inlined]
 [20] _permute(as::NDTensors.NeverAlias, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
    @ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:69
 [21] permute(as::NDTensors.NeverAlias, T::ITensors.ITensor, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
    @ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:73
 [22] #permute#271
    @ ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:54 [inlined]
 [23] permute(T::ITensors.ITensor, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
    @ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:38
 [24] top-level scope
    @ ~/it_test2/test.jl:17
in expression starting at /home/william/it_test2/test.jl:17

Output of `cuA, cuB = NDTensors.cu(A), NDTensors.cu(B)` in REPL

┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f5e6b5fc970.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:

Also, running `NDTensors/ext/examples/NDTensorCUDA.jl` results in a similar error

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of setindex! resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
 [3] setindex!
   @ ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:17 [inlined]
 [4] generic_randn(arraytype::Type{CuArray{T, 1} where T}, dim::Int64)
   @ NDTensors ~/.julia/packages/NDTensors/olKso/src/abstractarray/fill.jl:8
 [5] main()
   @ Main ~/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:21
 [6] top-level scope
   @ ~/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:116
in expression starting at /home/william/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:116

Version information

  • Output from versioninfo():
julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores
Other relevant version info

(it_test2) pkg> status ITensors
Status `~/it_test2/Project.toml`
  [9136182c] ITensors v0.3.42

(it_test2) pkg> status NDTensors
Status `~/it_test2/Project.toml`
  [23ae76d9] NDTensors v0.2.10

(it_test2) pkg> status CUDA
Status `~/it_test2/Project.toml`
  [052768ef] CUDA v4.4.1

julia> CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.5

CUDA libraries: 
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+535.104.5

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3090 (sm_86, 22.728 GiB / 24.000 GiB available)

I am very new to Julia (and ITensors.jl), sorry if there is something obvious I'm missing!

@wbernoudy wbernoudy added bug Something isn't working NDTensors Requires changes to the NDTensors.jl library. labels Sep 13, 2023
@mtfishman
Copy link
Member

Thanks for the report. @kmp5VT could you take a look?

@NuclearPowerNerd
Copy link
Contributor

Hello!

I am testing out the updated NDTensor CUDA extension. I am using ITensors.jl v0.3.46, CUDA.jl v5.0.0, and CUDA v11.8. I am still getting the error described in the OP for the following operation. Also I get it anytime I try to use contract with either naive or zipup.

MWE - Trying to truncate after a direct sum of two MPS

using CUDA, ITensors

function main()

    gpu = cu

    d = 2
    N = 5
    n = d^N
    x = LinRange(1, 2, n)
    f = x .^ 2
    g = sin.(x)
    s = siteinds(d, N)
    fmps = gpu(MPS(f, s))
    gmps = gpu(MPS(g, s))
    hmps = gpu(MPS(zeros(n), s))
    hmps .= add(fmps, gmps; alg="directsum")
    truncate!(hmps; cutoff=1E-8)
end

main()

@mtfishman
Copy link
Member

@NuclearPowerNerd could you double check to make sure you are using the latest version? The error you quoted in the first post should have been fixed by #1194.

@kmp5VT
Copy link
Collaborator

kmp5VT commented Oct 23, 2023

@mtfishman I think there still might be an issue here, specifically in the truncate! call. I am looking into this now

@NuclearPowerNerd
Copy link
Contributor

@NuclearPowerNerd could you double check to make sure you are using the latest version? The error you quoted in the first post should have been fixed by #1194.

Yes. I double checked and I have ITensors v0.3.46 in the Project.toml and I also looked in the Manifest and it lists NDTensors v0.2.14 under dependencies of ITensors.

@mtfishman
Copy link
Member

Thanks for confirming. Some of my confusion was that this appears to be specific to the CUDA.jl backend, I was running the code you shared using our Metal.jl backend and wasn't seeing any issues. We are investigating this, it looks like it is an easy fix though we might make some more involved changes to make it easier for us to fix issues that are particular to certain GPU backends in the future so will take a bit of time.

@mtfishman
Copy link
Member

This will be fixed by #1220.

@NuclearPowerNerd
Copy link
Contributor

I can create a new issue if needed - please just let me know. I am still getting this error any time I try to use the zipup algorithm.

@mtfishman
Copy link
Member

Could you start a new issue reporting a particular problem with a minimal code example as well as the full error message/stack trace? Also be sure to test against the latest versions of NDTensors and ITensors, we have been making a number of improvements and bug fixes very recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working NDTensors Requires changes to the NDTensors.jl library.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants