[NDTensors] [BUG] Many basic tensor operations fail with CUDA extension due to `LoadError: Scalar indexing is disallowed.` #1193

wbernoudy · 2023-09-13T19:40:05Z

Description of bug

I am attempting to use the CUDA extension for NDTensors. I am finding that tensor contraction sometimes fails, along with other functions like permute, as well as the NDTensors.generic_randn call used in the example of the NDTensors CUDA extension (NDTensors/ext/examples/NDTensorCUDA.jl).

They all fail with ERROR: LoadError: Scalar indexing is disallowed. (or emit a similar warning in the REPL where scalar indexing is allowed).

Minimal code demonstrating the bug or unexpected behavior

Minimal runnable code

import CUDA
import ITensors
import NDTensors

i, j, k = ITensors.Index(3), ITensors.Index(5), ITensors.Index(2)
A, B = ITensors.randomITensor(i, j), ITensors.randomITensor(j, k)

# works when running script directly, but warns of scalar indexing in REPL
cuA, cuB = NDTensors.cu(A), NDTensors.cu(B)

contracted = cuA * cuB

@show NDTensors.cpu(contracted) ≈ A * B

# without @CUDA.allowscalar, fails with `ERROR: LoadError: Scalar indexing is disallowed.`
permuted = ITensors.permute(cuA, (j, i))

Expected output or behavior

NDTensors.cu() should work without warning in REPL on ITensors.ITensor, and ITensors.permute should work without the scalar indexing error.

Actual output or behavior

Output of minimal runnable code

NDTensors.cpu(contracted) ≈ A * B = true
ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
  [3] getindex
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:9 [inlined]
  [4] getindex
    @ ~/.julia/packages/StridedViews/MjjHH/src/stridedview.jl:97 [inlined]
  [5] macro expansion
    @ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:330 [inlined]
  [6] macro expansion
    @ ./simdloop.jl:77 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:329 [inlined]
  [8] _mapreduce_kernel!(f::Strided.CaptureArgs{NDTensors.var"#69#70", Tuple{Strided.Arg, Strided.Arg}}, op::Nothing, initop::Nothing, dims::Tuple{Int64, Int64}, blocks::Tuple{Int64, Int64}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, offsets::Tuple{Int64, Int64, Int64})
    @ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:229
  [9] _mapreduce_block!(f::Any, op::Any, initop::Any, dims::Tuple{Int64, Int64}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, offsets::Tuple{Int64, Int64, Int64}, costs::Tuple{Int64, Int64}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
    @ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:152
 [10] _mapreduce_order!(f::Any, op::Any, initop::Any, dims::Tuple{Int64, Int64}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
    @ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:138
 [11] _mapreduce_fuse!
    @ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:116 [inlined]
 [12] copyto!
    @ ~/.julia/packages/Strided/O8zfl/src/broadcast.jl:35 [inlined]
 [13] materialize!
    @ ./broadcast.jl:884 [inlined]
 [14] materialize!
    @ ./broadcast.jl:881 [inlined]
 [15] permutedims!(R::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, perm::Tuple{Int64, Int64}, f::NDTensors.var"#69#70")
    @ NDTensors ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:246
 [16] permutedims!!(R::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, perm::Tuple{Int64, Int64}, f::Function)
    @ NDTensors ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:188
 [17] permutedims!!
    @ ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:186 [inlined]
 [18] permutedims
    @ ~/.julia/packages/NDTensors/olKso/src/tensoralgebra/generic_tensor_operations.jl:5 [inlined]
 [19] permutedims
    @ ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:64 [inlined]
 [20] _permute(as::NDTensors.NeverAlias, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
    @ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:69
 [21] permute(as::NDTensors.NeverAlias, T::ITensors.ITensor, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
    @ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:73
 [22] #permute#271
    @ ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:54 [inlined]
 [23] permute(T::ITensors.ITensor, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
    @ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:38
 [24] top-level scope
    @ ~/it_test2/test.jl:17
in expression starting at /home/william/it_test2/test.jl:17

Output of `cuA, cuB = NDTensors.cu(A), NDTensors.cu(B)` in REPL

┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f5e6b5fc970.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:

Also, running `NDTensors/ext/examples/NDTensorCUDA.jl` results in a similar error

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of setindex! resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
 [3] setindex!
   @ ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:17 [inlined]
 [4] generic_randn(arraytype::Type{CuArray{T, 1} where T}, dim::Int64)
   @ NDTensors ~/.julia/packages/NDTensors/olKso/src/abstractarray/fill.jl:8
 [5] main()
   @ Main ~/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:21
 [6] top-level scope
   @ ~/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:116
in expression starting at /home/william/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:116

Version information

Output from versioninfo():

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

Other relevant version info

(it_test2) pkg> status ITensors
Status `~/it_test2/Project.toml`
  [9136182c] ITensors v0.3.42

(it_test2) pkg> status NDTensors
Status `~/it_test2/Project.toml`
  [23ae76d9] NDTensors v0.2.10

(it_test2) pkg> status CUDA
Status `~/it_test2/Project.toml`
  [052768ef] CUDA v4.4.1

julia> CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.5

CUDA libraries: 
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+535.104.5

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3090 (sm_86, 22.728 GiB / 24.000 GiB available)

I am very new to Julia (and ITensors.jl), sorry if there is something obvious I'm missing!

The text was updated successfully, but these errors were encountered:

mtfishman · 2023-09-13T19:50:41Z

Thanks for the report. @kmp5VT could you take a look?

NuclearPowerNerd · 2023-10-23T02:59:42Z

Hello!

I am testing out the updated NDTensor CUDA extension. I am using ITensors.jl v0.3.46, CUDA.jl v5.0.0, and CUDA v11.8. I am still getting the error described in the OP for the following operation. Also I get it anytime I try to use contract with either naive or zipup.

MWE - Trying to truncate after a direct sum of two MPS

using CUDA, ITensors

function main()

    gpu = cu

    d = 2
    N = 5
    n = d^N
    x = LinRange(1, 2, n)
    f = x .^ 2
    g = sin.(x)
    s = siteinds(d, N)
    fmps = gpu(MPS(f, s))
    gmps = gpu(MPS(g, s))
    hmps = gpu(MPS(zeros(n), s))
    hmps .= add(fmps, gmps; alg="directsum")
    truncate!(hmps; cutoff=1E-8)
end

main()

mtfishman · 2023-10-23T12:56:30Z

@NuclearPowerNerd could you double check to make sure you are using the latest version? The error you quoted in the first post should have been fixed by #1194.

kmp5VT · 2023-10-23T13:41:03Z

@mtfishman I think there still might be an issue here, specifically in the truncate! call. I am looking into this now

NuclearPowerNerd · 2023-10-25T03:24:29Z

@NuclearPowerNerd could you double check to make sure you are using the latest version? The error you quoted in the first post should have been fixed by #1194.

Yes. I double checked and I have ITensors v0.3.46 in the Project.toml and I also looked in the Manifest and it lists NDTensors v0.2.14 under dependencies of ITensors.

mtfishman · 2023-10-26T00:15:42Z

Thanks for confirming. Some of my confusion was that this appears to be specific to the CUDA.jl backend, I was running the code you shared using our Metal.jl backend and wasn't seeing any issues. We are investigating this, it looks like it is an easy fix though we might make some more involved changes to make it easier for us to fix issues that are particular to certain GPU backends in the future so will take a bit of time.

mtfishman · 2023-10-27T16:43:49Z

This will be fixed by #1220.

NuclearPowerNerd · 2023-11-09T14:37:24Z

I can create a new issue if needed - please just let me know. I am still getting this error any time I try to use the zipup algorithm.

mtfishman · 2023-11-09T14:44:08Z

Could you start a new issue reporting a particular problem with a minimal code example as well as the full error message/stack trace? Also be sure to test against the latest versions of NDTensors and ITensors, we have been making a number of improvements and bug fixes very recently.

wbernoudy added bug Something isn't working NDTensors Requires changes to the NDTensors.jl library. labels Sep 13, 2023

kmp5VT mentioned this issue Sep 13, 2023

[NDTensors][NDTensorsCUDAExt] Improve performance of GPU backends #1194

Merged

kmp5VT mentioned this issue Oct 23, 2023

[NDTensors] Circumvent scalar indexing to improve GPU performance #1216

Closed

2 tasks

mtfishman mentioned this issue Oct 27, 2023

[NDTensors] Unwrap module for dispatching on unwrapped types #1220

Merged

5 tasks

mtfishman closed this as completed in #1220 Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NDTensors] [BUG] Many basic tensor operations fail with CUDA extension due to `LoadError: Scalar indexing is disallowed.` #1193

[NDTensors] [BUG] Many basic tensor operations fail with CUDA extension due to `LoadError: Scalar indexing is disallowed.` #1193

wbernoudy commented Sep 13, 2023

mtfishman commented Sep 13, 2023

NuclearPowerNerd commented Oct 23, 2023

mtfishman commented Oct 23, 2023

kmp5VT commented Oct 23, 2023

NuclearPowerNerd commented Oct 25, 2023

mtfishman commented Oct 26, 2023

mtfishman commented Oct 27, 2023

NuclearPowerNerd commented Nov 9, 2023

mtfishman commented Nov 9, 2023

[NDTensors] [BUG] Many basic tensor operations fail with CUDA extension due to LoadError: Scalar indexing is disallowed. #1193

[NDTensors] [BUG] Many basic tensor operations fail with CUDA extension due to LoadError: Scalar indexing is disallowed. #1193

Comments

wbernoudy commented Sep 13, 2023

mtfishman commented Sep 13, 2023

NuclearPowerNerd commented Oct 23, 2023

mtfishman commented Oct 23, 2023

kmp5VT commented Oct 23, 2023

NuclearPowerNerd commented Oct 25, 2023

mtfishman commented Oct 26, 2023

mtfishman commented Oct 27, 2023

NuclearPowerNerd commented Nov 9, 2023

mtfishman commented Nov 9, 2023

[NDTensors] [BUG] Many basic tensor operations fail with CUDA extension due to `LoadError: Scalar indexing is disallowed.` #1193

[NDTensors] [BUG] Many basic tensor operations fail with CUDA extension due to `LoadError: Scalar indexing is disallowed.` #1193