[NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy #1232

jtschneider · 2023-11-03T14:42:03Z

Description of bug

Hi Matt and Miles!

I was pretty excited about your announcement of the GPU capabilities super-seeding the ITensorGPU package, so I went ahead and tested it. Unfortunately, I have to report that the DMRG function does not run correctly on the GPU on my local cluster.

First, I made sure that CUDA and ITensor are properly installed and the behaviour is as expected and ran your example code:

Checking if my ITensors and CUDA installation work

julia> using ITensors

julia> using CUDA

julia> i, j, k = Index.((200, 200, 200))
((dim=200|id=17), (dim=200|id=227), (dim=200|id=137))

julia> A = randomITensor(i, j)
ITensor ord=2 (dim=200|id=17) (dim=200|id=227)
NDTensors.Dense{Float64, Vector{Float64}}

julia> B = randomITensor(j, k)
ITensor ord=2 (dim=200|id=227) (dim=200|id=137)
NDTensors.Dense{Float64, Vector{Float64}}

julia> A * B
ITensor ord=2 (dim=200|id=17) (dim=200|id=137)
NDTensors.Dense{Float64, Vector{Float64}}

julia> Agpu = cu(A)
ITensor ord=2 (dim=200|id=17) (dim=200|id=227)
NDTensors.Dense{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}

julia> Bgpu = cu(B)
ITensor ord=2 (dim=200|id=227) (dim=200|id=137)
NDTensors.Dense{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}

julia> Agpu * Bgpu
ITensor ord=2 (dim=200|id=17) (dim=200|id=137)
NDTensors.Dense{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}

julia> using BenchmarkTools

julia> @benchmark $Agpu * $Bgpu
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  28.335 μs …  21.596 ms  ┊ GC (min … max): 0.00% … 44.55%
 Time  (median):     29.286 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   32.096 μs ± 215.693 μs  ┊ GC (mean ± σ):  3.00% ±  0.45%

      ▃▆█▇▅▂                                                    
  ▁▂▃▆███████▆▅▄▂▂▂▂▃▃▄▄▄▄▅▅▄▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  28.3 μs         Histogram: frequency by time         34.1 μs <

 Memory estimate: 5.94 KiB, allocs estimate: 106.

julia> @benchmark $A * $B
BenchmarkTools.Trial: 989 samples with 1 evaluation.
 Range (min … max):  1.886 ms … 331.198 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.645 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.054 ms ±  10.846 ms  ┊ GC (mean ± σ):  0.14% ± 0.25%

  ▇                █▇▂▁              ▃▃▂                       
  █▅▁▅▄▄▄▁▁▄▁▁▁▁▁▁▁████▅▇▅▅▁▁▄▁▁▁▁▁▁▅███▇▅▅▅▁▁▁▁▅▁▁▁▁▁▁▁▁▄▁▁▄ ▇
  1.89 ms      Histogram: log(frequency) by time        11 ms <

 Memory estimate: 315.33 KiB, allocs estimate: 28.

Minimal code demonstrating the bug or unexpected behavior
The following code runs the DMRG for the critical Ising chain and does not return the correct ground state when executed on my GPU, in particular not the correct energy:

Minimal runnable code

using ITensors
using CUDA

# from  Eq. (13) in T W Burkhardt and I Guim (1985), J. Phys. A: Math. Gen. 18 L33, https://doi.org/10.1088/0305-4470/18/1/006
exact_crit_ising_energy_OBC(N::Int) =  1.0 - csc(π/(2.0*(2.0*N+1)))

platform = cu
# platform = identity



begin
  N = 80
  χ = 100
  SVDcutoff = 1e-15

  sites = siteinds("S=1/2",N)

  ampo = OpSum()
  for j=1:N-1
    ampo += -1.0,"X",j,"X",j+1
    ampo += -1.0,"Z",j
  end
  ampo += -1.0,"Z",N

  H = MPO(ampo,sites)

  psi0 = randomMPS(sites,10);

  sweeps = Sweeps(12)
  mindim!(sweeps, 5, 10, 20)
  maxdim!(sweeps, floor.(Int, LinRange(10, χ, 10))...)
  cutoff!(sweeps, SVDcutoff)


  energy, psi = dmrg(platform(H),platform(psi0), sweeps, eigsolve_krylovdim=4)

  @show energy
  @show exact_crit_ising_energy_OBC(N)
  @show abs(energy - exact_crit_ising_energy_OBC(N))
  
end

Actual output or behavior

The DMRG sweeps do not converge at all over 12 iterations from left to right and back.
The maximal estimated error is also not really converging:

Output of minimal runnable code

After sweep 1 energy=-28.0837  maxlinkdim=10 maxerr=2.99E-03 time=5.883
After sweep 2 energy=-20.88147  maxlinkdim=20 maxerr=1.48E-05 time=5.297
After sweep 3 energy=-20.74522  maxlinkdim=30 maxerr=6.61E-05 time=6.090
After sweep 4 energy=-22.907272  maxlinkdim=40 maxerr=2.51E-05 time=6.563
After sweep 5 energy=-22.044518  maxlinkdim=50 maxerr=2.25E-05 time=7.221
After sweep 6 energy=-18.909777  maxlinkdim=60 maxerr=3.81E-05 time=8.029
After sweep 7 energy=-21.543278  maxlinkdim=69 maxerr=7.64E-05 time=8.208
After sweep 8 energy=-20.000504  maxlinkdim=80 maxerr=2.99E-05 time=8.675
After sweep 9 energy=-21.142273  maxlinkdim=90 maxerr=5.27E-05 time=9.416
After sweep 10 energy=-19.738571  maxlinkdim=100 maxerr=2.19E-05 time=10.298
After sweep 11 energy=-21.7779  maxlinkdim=100 maxerr=8.57E-05 time=10.531
After sweep 12 energy=-23.711718  maxlinkdim=100 maxerr=2.44E-05 time=10.670
energy = -23.711718f0
exact_crit_ising_energy_OBC(N) = -101.49740945239314
abs(energy - exact_crit_ising_energy_OBC(N)) = 77.78569184680232
77.78569184680232

Expected output or behavior

I expect the DMRG code to converge, and even towards the exact ground state energy of the (in this case) critical Ising chain. Here is the code and the output of the same code executed on a CPU (up to changes such that it runs on a CPU):

Output of expected behaviour with runnable code

using ITensors
# using CUDA

# from  Eq. (13) in T W Burkhardt and I Guim (1985), J. Phys. A: Math. Gen. 18 L33, https://doi.org/10.1088/0305-4470/18/1/006
exact_crit_ising_energy_OBC(N::Int) =  1.0 - csc(π/(2.0*(2.0*N+1)))

# platform = cu
platform = identity



begin
  N = 80
  χ = 100
  SVDcutoff = 1e-15

  sites = siteinds("S=1/2",N)

  ampo = OpSum()
  for j=1:N-1
    ampo += -1.0,"X",j,"X",j+1
    ampo += -1.0,"Z",j
  end
  ampo += -1.0,"Z",N

  H = MPO(ampo,sites)

  psi0 = randomMPS(sites,10);

  sweeps = Sweeps(12)
  mindim!(sweeps, 5, 10, 20)
  maxdim!(sweeps, floor.(Int, LinRange(10, χ, 10))...)
  cutoff!(sweeps, SVDcutoff)


  energy, psi = dmrg(platform(H),platform(psi0), sweeps, eigsolve_krylovdim=4)

  @show energy
  @show exact_crit_ising_energy_OBC(N)
  @show abs(energy - exact_crit_ising_energy_OBC(N))
  
end


After sweep 1 energy=-101.39819345083906  maxlinkdim=10 maxerr=6.68E-04 time=0.071
After sweep 2 energy=-101.48102255851883  maxlinkdim=20 maxerr=5.85E-09 time=0.106
After sweep 3 energy=-101.4943088604344  maxlinkdim=30 maxerr=2.21E-10 time=0.214
After sweep 4 energy=-101.49670541043066  maxlinkdim=40 maxerr=1.27E-11 time=0.387
After sweep 5 energy=-101.49728675059629  maxlinkdim=50 maxerr=8.85E-13 time=0.522
After sweep 6 energy=-101.49738861976545  maxlinkdim=60 maxerr=4.55E-14 time=0.708
After sweep 7 energy=-101.49740613917584  maxlinkdim=69 maxerr=4.16E-14 time=0.920
After sweep 8 energy=-101.49740894324076  maxlinkdim=80 maxerr=5.46E-15 time=0.982
After sweep 9 energy=-101.49740941049333  maxlinkdim=90 maxerr=1.63E-15 time=0.925
After sweep 10 energy=-101.49740945007905  maxlinkdim=78 maxerr=9.99E-16 time=0.794
After sweep 11 energy=-101.49740945237563  maxlinkdim=81 maxerr=9.96E-16 time=0.582
After sweep 12 energy=-101.49740945239157  maxlinkdim=44 maxerr=9.96E-16 time=0.393
energy = -101.49740945239157
exact_crit_ising_energy_OBC(N) = -101.49740945239314
abs(energy - exact_crit_ising_energy_OBC(N)) = 1.5774048733874224e-12
1.5774048733874224e-12

Version information

Output from versioninfo():

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 56 × Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, icelake-server)
  Threads: 5 on 56 virtual cores
Environment:
  LD_LIBRARY_PATH = /dragofs/sw/foss/0.2/software/ncurses/6.2-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/binutils/2.37-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/zlib/1.2.11-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/GCCcore/11.2.0/lib64
  JULIA_NUM_THREADS = 4

Output from using Pkg; Pkg.status("ITensors"):

julia> using Pkg; Pkg.status("ITensors")
Status `~/Code/jan_codes/Ising_Model/CUDA/Project.toml`
  [9136182c] ITensors v0.3.49

Output from CUDA:

julia> CUDA.driver_version()
v"12.2.0"

julia> CUDA.system_driver_version()
v"11.6.0"

julia> CUDA.runtime_version()
v"12.2.0"

julia> CUDA.current_device()
CuDevice(0): NVIDIA A100-PCIE-40GB

The text was updated successfully, but these errors were encountered:

mtfishman · 2023-11-03T15:11:42Z

@jtschneider thanks for the report. Could you try running the GPU code with double precision? You can use NDTensors.cu instead of cu to transfer and preserve the element type.

mtfishman · 2023-11-03T15:16:22Z

Also, have you tried other GPUs, i.e. do you have reason to believe this is particular to your cluster GPU?

jtschneider · 2023-11-03T15:25:08Z

Hi @mtfishman ! Thanks for the quick reply, I just ran the code with NDTensors.cu instead of cu as follows:

Code

using ITensors
using CUDA

# from  Eq. (13) in T W Burkhardt and I Guim (1985), J. Phys. A: Math. Gen. 18 L33, https://doi.org/10.1088/0305-4470/18/1/006
exact_crit_ising_energy_OBC(N::Int) =  1.0 - csc(π/(2.0*(2.0*N+1)))

platform = NDTensors.cu
# platform = identity



begin
  N = 80
  χ = 100
  SVDcutoff = 1e-15

  sites = siteinds("S=1/2",N)

  ampo = OpSum()
  for j=1:N-1
    ampo += -1.0,"X",j,"X",j+1
    ampo += -1.0,"Z",j
  end
  ampo += -1.0,"Z",N

  H = MPO(ampo,sites)

  psi0 = randomMPS(sites,10);

  sweeps = Sweeps(12)
  mindim!(sweeps, 5, 10, 20)
  maxdim!(sweeps, floor.(Int, LinRange(10, χ, 10))...)
  cutoff!(sweeps, SVDcutoff)


  energy, psi = dmrg(platform(H),platform(psi0), sweeps, eigsolve_krylovdim=3)

  @show energy
  @show exact_crit_ising_energy_OBC(N)
  @show abs(energy - exact_crit_ising_energy_OBC(N))
  
end

and the code is still not converging:

Output

After sweep 1 energy=-34.02635739449884  maxlinkdim=10 maxerr=4.56E-03 time=26.558
After sweep 2 energy=-35.73495573205305  maxlinkdim=20 maxerr=2.36E-05 time=4.731
After sweep 3 energy=-27.673262867727896  maxlinkdim=30 maxerr=4.33E-05 time=5.202
After sweep 4 energy=-30.27702509081329  maxlinkdim=40 maxerr=4.37E-05 time=5.549
After sweep 5 energy=-29.923987348959038  maxlinkdim=50 maxerr=5.35E-05 time=6.186
After sweep 6 energy=-29.039607624499894  maxlinkdim=60 maxerr=3.23E-05 time=6.768
After sweep 7 energy=-32.17514174963293  maxlinkdim=69 maxerr=4.39E-05 time=7.341
After sweep 8 energy=-33.06125319883081  maxlinkdim=80 maxerr=2.40E-05 time=7.907
After sweep 9 energy=-33.009814490629275  maxlinkdim=90 maxerr=3.31E-05 time=8.746
After sweep 10 energy=-35.74879349714169  maxlinkdim=100 maxerr=2.65E-05 time=9.378
After sweep 11 energy=-33.335029696599555  maxlinkdim=100 maxerr=3.98E-05 time=9.584
After sweep 12 energy=-32.79446396863406  maxlinkdim=100 maxerr=6.33E-05 time=9.941
energy = -32.79446396863406
exact_crit_ising_energy_OBC(N) = -101.49740945239314
abs(energy - exact_crit_ising_energy_OBC(N)) = 68.70294548375908
68.70294548375908

Yeah perhaps the title is a bit misleading, I have access to another nVidia GPU: the model Tesla T4, but I have to queue for that quite some time and I could not get resources allocated for a quick interactive session. My PC does not have a GPU, it has an AMD CPU which is also why I cannot test against a different GPU-runnable version of the DMRG script.

mtfishman · 2023-11-03T15:46:12Z

Hmm pretty strange. I tested the Metal backend and it seems to work with this example (after fixing a few unrelated bugs in that backend).

@kmp5VT could you take a look?

kmp5VT · 2023-11-03T16:03:22Z

Hi all, @mtfishman I am looking into this now!

kmp5VT · 2023-11-03T20:57:56Z

@jtschneider I think Matt and I were able to find/address your issue. #1236 should fix this bug. Thanks!!

jtschneider added bug Something isn't working ITensorGPU Issues or pull requests related to the ITensorGPU package. labels Nov 3, 2023

jtschneider changed the title ~~[ITensorGPU] [BUG] YOUR SHORT DESCRIPTION OF THE BUG HERE~~ [ITensorGPU] [BUG] DMRG does not converge on the nVidia A100 Nov 3, 2023

mtfishman changed the title ~~[ITensorGPU] [BUG] DMRG does not converge on the nVidia A100~~ [NDTensorsCUDAExt] [BUG] DMRG does not converge on the nVidia A100 Nov 3, 2023

mtfishman changed the title ~~[NDTensorsCUDAExt] [BUG] DMRG does not converge on the nVidia A100~~ [NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy Nov 3, 2023

kmp5VT linked a pull request Nov 3, 2023 that will close this issue

[NDTensorsCUDAExt] Fix bug in slicing Adjoint CuMatrix #1236

Merged

mtfishman closed this as completed in #1236 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy #1232

[NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy #1232

jtschneider commented Nov 3, 2023

mtfishman commented Nov 3, 2023

mtfishman commented Nov 3, 2023

jtschneider commented Nov 3, 2023

mtfishman commented Nov 3, 2023

kmp5VT commented Nov 3, 2023

kmp5VT commented Nov 3, 2023 •

edited

Loading

[NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy #1232

[NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy #1232

Comments

jtschneider commented Nov 3, 2023

mtfishman commented Nov 3, 2023

mtfishman commented Nov 3, 2023

jtschneider commented Nov 3, 2023

mtfishman commented Nov 3, 2023

kmp5VT commented Nov 3, 2023

kmp5VT commented Nov 3, 2023 • edited Loading

kmp5VT commented Nov 3, 2023 •

edited

Loading