Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy #1232

Closed
jtschneider opened this issue Nov 3, 2023 · 6 comments · Fixed by #1236
Closed

[NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy #1232

jtschneider opened this issue Nov 3, 2023 · 6 comments · Fixed by #1236
Labels
bug Something isn't working ITensorGPU Issues or pull requests related to the ITensorGPU package.

Comments

@jtschneider
Copy link
Contributor

Description of bug

Hi Matt and Miles!

I was pretty excited about your announcement of the GPU capabilities super-seeding the ITensorGPU package, so I went ahead and tested it. Unfortunately, I have to report that the DMRG function does not run correctly on the GPU on my local cluster.

First, I made sure that CUDA and ITensor are properly installed and the behaviour is as expected and ran your example code:

Checking if my ITensors and CUDA installation work

julia> using ITensors

julia> using CUDA

julia> i, j, k = Index.((200, 200, 200))
((dim=200|id=17), (dim=200|id=227), (dim=200|id=137))

julia> A = randomITensor(i, j)
ITensor ord=2 (dim=200|id=17) (dim=200|id=227)
NDTensors.Dense{Float64, Vector{Float64}}

julia> B = randomITensor(j, k)
ITensor ord=2 (dim=200|id=227) (dim=200|id=137)
NDTensors.Dense{Float64, Vector{Float64}}

julia> A * B
ITensor ord=2 (dim=200|id=17) (dim=200|id=137)
NDTensors.Dense{Float64, Vector{Float64}}

julia> Agpu = cu(A)
ITensor ord=2 (dim=200|id=17) (dim=200|id=227)
NDTensors.Dense{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}

julia> Bgpu = cu(B)
ITensor ord=2 (dim=200|id=227) (dim=200|id=137)
NDTensors.Dense{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}

julia> Agpu * Bgpu
ITensor ord=2 (dim=200|id=17) (dim=200|id=137)
NDTensors.Dense{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}

julia> using BenchmarkTools

julia> @benchmark $Agpu * $Bgpu
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  28.335 μs   21.596 ms  ┊ GC (min  max): 0.00%  44.55%
 Time  (median):     29.286 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   32.096 μs ± 215.693 μs  ┊ GC (mean ± σ):  3.00% ±  0.45%

      ▃▆█▇▅▂                                                    
  ▁▂▃▆███████▆▅▄▂▂▂▂▃▃▄▄▄▄▅▅▄▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  28.3 μs         Histogram: frequency by time         34.1 μs <

 Memory estimate: 5.94 KiB, allocs estimate: 106.

julia> @benchmark $A * $B
BenchmarkTools.Trial: 989 samples with 1 evaluation.
 Range (min  max):  1.886 ms  331.198 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     4.645 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.054 ms ±  10.846 ms  ┊ GC (mean ± σ):  0.14% ± 0.25%

  ▇                █▇▂▁              ▃▃▂                       
  █▅▁▅▄▄▄▁▁▄▁▁▁▁▁▁▁████▅▇▅▅▁▁▄▁▁▁▁▁▁▅███▇▅▅▅▁▁▁▁▅▁▁▁▁▁▁▁▁▄▁▁▄ ▇
  1.89 ms      Histogram: log(frequency) by time        11 ms <

 Memory estimate: 315.33 KiB, allocs estimate: 28.

Minimal code demonstrating the bug or unexpected behavior
The following code runs the DMRG for the critical Ising chain and does not return the correct ground state when executed on my GPU, in particular not the correct energy:

Minimal runnable code

using ITensors
using CUDA

# from  Eq. (13) in T W Burkhardt and I Guim (1985), J. Phys. A: Math. Gen. 18 L33, https://doi.org/10.1088/0305-4470/18/1/006
exact_crit_ising_energy_OBC(N::Int) =  1.0 - csc/(2.0*(2.0*N+1)))

platform = cu
# platform = identity



begin
  N = 80
  χ = 100
  SVDcutoff = 1e-15

  sites = siteinds("S=1/2",N)

  ampo = OpSum()
  for j=1:N-1
    ampo += -1.0,"X",j,"X",j+1
    ampo += -1.0,"Z",j
  end
  ampo += -1.0,"Z",N

  H = MPO(ampo,sites)

  psi0 = randomMPS(sites,10);

  sweeps = Sweeps(12)
  mindim!(sweeps, 5, 10, 20)
  maxdim!(sweeps, floor.(Int, LinRange(10, χ, 10))...)
  cutoff!(sweeps, SVDcutoff)


  energy, psi = dmrg(platform(H),platform(psi0), sweeps, eigsolve_krylovdim=4)

  @show energy
  @show exact_crit_ising_energy_OBC(N)
  @show abs(energy - exact_crit_ising_energy_OBC(N))
  
end

Actual output or behavior

The DMRG sweeps do not converge at all over 12 iterations from left to right and back.
The maximal estimated error is also not really converging:

Output of minimal runnable code

After sweep 1 energy=-28.0837  maxlinkdim=10 maxerr=2.99E-03 time=5.883
After sweep 2 energy=-20.88147  maxlinkdim=20 maxerr=1.48E-05 time=5.297
After sweep 3 energy=-20.74522  maxlinkdim=30 maxerr=6.61E-05 time=6.090
After sweep 4 energy=-22.907272  maxlinkdim=40 maxerr=2.51E-05 time=6.563
After sweep 5 energy=-22.044518  maxlinkdim=50 maxerr=2.25E-05 time=7.221
After sweep 6 energy=-18.909777  maxlinkdim=60 maxerr=3.81E-05 time=8.029
After sweep 7 energy=-21.543278  maxlinkdim=69 maxerr=7.64E-05 time=8.208
After sweep 8 energy=-20.000504  maxlinkdim=80 maxerr=2.99E-05 time=8.675
After sweep 9 energy=-21.142273  maxlinkdim=90 maxerr=5.27E-05 time=9.416
After sweep 10 energy=-19.738571  maxlinkdim=100 maxerr=2.19E-05 time=10.298
After sweep 11 energy=-21.7779  maxlinkdim=100 maxerr=8.57E-05 time=10.531
After sweep 12 energy=-23.711718  maxlinkdim=100 maxerr=2.44E-05 time=10.670
energy = -23.711718f0
exact_crit_ising_energy_OBC(N) = -101.49740945239314
abs(energy - exact_crit_ising_energy_OBC(N)) = 77.78569184680232
77.78569184680232

Expected output or behavior

I expect the DMRG code to converge, and even towards the exact ground state energy of the (in this case) critical Ising chain. Here is the code and the output of the same code executed on a CPU (up to changes such that it runs on a CPU):

Output of expected behaviour with runnable code

using ITensors
# using CUDA

# from  Eq. (13) in T W Burkhardt and I Guim (1985), J. Phys. A: Math. Gen. 18 L33, https://doi.org/10.1088/0305-4470/18/1/006
exact_crit_ising_energy_OBC(N::Int) =  1.0 - csc/(2.0*(2.0*N+1)))

# platform = cu
platform = identity



begin
  N = 80
  χ = 100
  SVDcutoff = 1e-15

  sites = siteinds("S=1/2",N)

  ampo = OpSum()
  for j=1:N-1
    ampo += -1.0,"X",j,"X",j+1
    ampo += -1.0,"Z",j
  end
  ampo += -1.0,"Z",N

  H = MPO(ampo,sites)

  psi0 = randomMPS(sites,10);

  sweeps = Sweeps(12)
  mindim!(sweeps, 5, 10, 20)
  maxdim!(sweeps, floor.(Int, LinRange(10, χ, 10))...)
  cutoff!(sweeps, SVDcutoff)


  energy, psi = dmrg(platform(H),platform(psi0), sweeps, eigsolve_krylovdim=4)

  @show energy
  @show exact_crit_ising_energy_OBC(N)
  @show abs(energy - exact_crit_ising_energy_OBC(N))
  
end


After sweep 1 energy=-101.39819345083906  maxlinkdim=10 maxerr=6.68E-04 time=0.071
After sweep 2 energy=-101.48102255851883  maxlinkdim=20 maxerr=5.85E-09 time=0.106
After sweep 3 energy=-101.4943088604344  maxlinkdim=30 maxerr=2.21E-10 time=0.214
After sweep 4 energy=-101.49670541043066  maxlinkdim=40 maxerr=1.27E-11 time=0.387
After sweep 5 energy=-101.49728675059629  maxlinkdim=50 maxerr=8.85E-13 time=0.522
After sweep 6 energy=-101.49738861976545  maxlinkdim=60 maxerr=4.55E-14 time=0.708
After sweep 7 energy=-101.49740613917584  maxlinkdim=69 maxerr=4.16E-14 time=0.920
After sweep 8 energy=-101.49740894324076  maxlinkdim=80 maxerr=5.46E-15 time=0.982
After sweep 9 energy=-101.49740941049333  maxlinkdim=90 maxerr=1.63E-15 time=0.925
After sweep 10 energy=-101.49740945007905  maxlinkdim=78 maxerr=9.99E-16 time=0.794
After sweep 11 energy=-101.49740945237563  maxlinkdim=81 maxerr=9.96E-16 time=0.582
After sweep 12 energy=-101.49740945239157  maxlinkdim=44 maxerr=9.96E-16 time=0.393
energy = -101.49740945239157
exact_crit_ising_energy_OBC(N) = -101.49740945239314
abs(energy - exact_crit_ising_energy_OBC(N)) = 1.5774048733874224e-12
1.5774048733874224e-12

Version information

  • Output from versioninfo():
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 56 × Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, icelake-server)
  Threads: 5 on 56 virtual cores
Environment:
  LD_LIBRARY_PATH = /dragofs/sw/foss/0.2/software/ncurses/6.2-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/binutils/2.37-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/zlib/1.2.11-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/GCCcore/11.2.0/lib64
  JULIA_NUM_THREADS = 4
  • Output from using Pkg; Pkg.status("ITensors"):
julia> using Pkg; Pkg.status("ITensors")
Status `~/Code/jan_codes/Ising_Model/CUDA/Project.toml`
  [9136182c] ITensors v0.3.49
  • Output from CUDA:
julia> CUDA.driver_version()
v"12.2.0"

julia> CUDA.system_driver_version()
v"11.6.0"

julia> CUDA.runtime_version()
v"12.2.0"

julia> CUDA.current_device()
CuDevice(0): NVIDIA A100-PCIE-40GB
@jtschneider jtschneider added bug Something isn't working ITensorGPU Issues or pull requests related to the ITensorGPU package. labels Nov 3, 2023
@jtschneider jtschneider changed the title [ITensorGPU] [BUG] YOUR SHORT DESCRIPTION OF THE BUG HERE [ITensorGPU] [BUG] DMRG does not converge on the nVidia A100 Nov 3, 2023
@mtfishman
Copy link
Member

@jtschneider thanks for the report. Could you try running the GPU code with double precision? You can use NDTensors.cu instead of cu to transfer and preserve the element type.

@mtfishman
Copy link
Member

Also, have you tried other GPUs, i.e. do you have reason to believe this is particular to your cluster GPU?

@jtschneider
Copy link
Contributor Author

Hi @mtfishman ! Thanks for the quick reply, I just ran the code with NDTensors.cu instead of cu as follows:

Code

using ITensors
using CUDA

# from  Eq. (13) in T W Burkhardt and I Guim (1985), J. Phys. A: Math. Gen. 18 L33, https://doi.org/10.1088/0305-4470/18/1/006
exact_crit_ising_energy_OBC(N::Int) =  1.0 - csc/(2.0*(2.0*N+1)))

platform = NDTensors.cu
# platform = identity



begin
  N = 80
  χ = 100
  SVDcutoff = 1e-15

  sites = siteinds("S=1/2",N)

  ampo = OpSum()
  for j=1:N-1
    ampo += -1.0,"X",j,"X",j+1
    ampo += -1.0,"Z",j
  end
  ampo += -1.0,"Z",N

  H = MPO(ampo,sites)

  psi0 = randomMPS(sites,10);

  sweeps = Sweeps(12)
  mindim!(sweeps, 5, 10, 20)
  maxdim!(sweeps, floor.(Int, LinRange(10, χ, 10))...)
  cutoff!(sweeps, SVDcutoff)


  energy, psi = dmrg(platform(H),platform(psi0), sweeps, eigsolve_krylovdim=3)

  @show energy
  @show exact_crit_ising_energy_OBC(N)
  @show abs(energy - exact_crit_ising_energy_OBC(N))
  
end

and the code is still not converging:

Output

After sweep 1 energy=-34.02635739449884  maxlinkdim=10 maxerr=4.56E-03 time=26.558
After sweep 2 energy=-35.73495573205305  maxlinkdim=20 maxerr=2.36E-05 time=4.731
After sweep 3 energy=-27.673262867727896  maxlinkdim=30 maxerr=4.33E-05 time=5.202
After sweep 4 energy=-30.27702509081329  maxlinkdim=40 maxerr=4.37E-05 time=5.549
After sweep 5 energy=-29.923987348959038  maxlinkdim=50 maxerr=5.35E-05 time=6.186
After sweep 6 energy=-29.039607624499894  maxlinkdim=60 maxerr=3.23E-05 time=6.768
After sweep 7 energy=-32.17514174963293  maxlinkdim=69 maxerr=4.39E-05 time=7.341
After sweep 8 energy=-33.06125319883081  maxlinkdim=80 maxerr=2.40E-05 time=7.907
After sweep 9 energy=-33.009814490629275  maxlinkdim=90 maxerr=3.31E-05 time=8.746
After sweep 10 energy=-35.74879349714169  maxlinkdim=100 maxerr=2.65E-05 time=9.378
After sweep 11 energy=-33.335029696599555  maxlinkdim=100 maxerr=3.98E-05 time=9.584
After sweep 12 energy=-32.79446396863406  maxlinkdim=100 maxerr=6.33E-05 time=9.941
energy = -32.79446396863406
exact_crit_ising_energy_OBC(N) = -101.49740945239314
abs(energy - exact_crit_ising_energy_OBC(N)) = 68.70294548375908
68.70294548375908

Yeah perhaps the title is a bit misleading, I have access to another nVidia GPU: the model Tesla T4, but I have to queue for that quite some time and I could not get resources allocated for a quick interactive session. My PC does not have a GPU, it has an AMD CPU which is also why I cannot test against a different GPU-runnable version of the DMRG script.

@mtfishman
Copy link
Member

Hmm pretty strange. I tested the Metal backend and it seems to work with this example (after fixing a few unrelated bugs in that backend).

@kmp5VT could you take a look?

@mtfishman mtfishman changed the title [ITensorGPU] [BUG] DMRG does not converge on the nVidia A100 [NDTensorsCUDAExt] [BUG] DMRG does not converge on the nVidia A100 Nov 3, 2023
@mtfishman mtfishman changed the title [NDTensorsCUDAExt] [BUG] DMRG does not converge on the nVidia A100 [NDTensorsCUDAExt] [BUG] DMRG does not converge to correct energy Nov 3, 2023
@kmp5VT
Copy link
Collaborator

kmp5VT commented Nov 3, 2023

Hi all, @mtfishman I am looking into this now!

@kmp5VT
Copy link
Collaborator

kmp5VT commented Nov 3, 2023

@jtschneider I think Matt and I were able to find/address your issue. #1236 should fix this bug. Thanks!!

@kmp5VT kmp5VT linked a pull request Nov 3, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ITensorGPU Issues or pull requests related to the ITensorGPU package.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants