Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting example 3 to a GPU backend #21

Closed
YotamKa opened this issue Jun 2, 2024 · 7 comments · Fixed by #22
Closed

Adjusting example 3 to a GPU backend #21

YotamKa opened this issue Jun 2, 2024 · 7 comments · Fixed by #22

Comments

@YotamKa
Copy link

YotamKa commented Jun 2, 2024

Dear Matt,

I tried using Nvidia's GPU backend for ITensorMPS TDVP calculations, but encountered an issue with the data types being passed to the ODE solver.

The problem is described below using code snippets.

Do you have a quick fix in mind for this issue?

Thanks,
Yotam


Consider, for example, the code in:
https://github.com/ITensor/ITensorMPS.jl/tree/main/examples/03_tdvp_time_dependent.jl'

I changed it a bit so it would run with my current versions:
"ITensors v0.3.68"
"ITensorMPS v0.2.1"

The following code runs:

using ITensors
using ITensorMPS
using LinearAlgebra: norm
using Random: Random

include("03_models.jl")
include("03_updaters.jl")

function main()
  Random.seed!(1234)

  # Time dependent Hamiltonian is:
  # H(t) = H₁(t) + H₂(t) + …
  #      = f₁(t) H₁(0) + f₂(t) H₂(0) + …
  #      = cos(ω₁t) H₁(0) + cos(ω₂t) H₂(0) + …

  # Number of sites
  n = 6

  # How much information to output from TDVP
  # Set to 2 to get information about each bond/site
  # evolution, and 3 to get information about the
  # updater.
  outputlevel = 3

  # Frequency of time dependent terms
  ω₁ = 0.1
  ω₂ = 0.2

  # Nearest and next-nearest neighbor
  # Heisenberg couplings.
  J₁ = 1.0
  J₂ = 1.0

  time_step = 0.1
  time_stop = 1.0

  # nsite-update TDVP
  nsite = 2

  # Starting state bond/link dimension.
  # A product state starting state can
  # cause issues for TDVP without
  # subspace expansion.
  start_linkdim = 4

  # TDVP truncation parameters
  maxdim = 100
  cutoff = 1e-8

  tol = 1e-15

  @show n
  @show ω₁, ω₂
  @show J₁, J₂
  @show maxdim, cutoff, nsite
  @show start_linkdim
  @show time_step, time_stop

  ω⃗ = (ω₁, ω₂)
  f⃗ = map(ω -> (t -> cos(ω * t)), ω⃗)

  # H₀ = H(0) = H₁(0) + H₂(0) + …
  ℋ₁₀ = heisenberg(n; J=J₁, J2=0.0)
  ℋ₂₀ = heisenberg(n; J=0.0, J2=J₂)
  ℋ⃗₀ = (ℋ₁₀, ℋ₂₀)

  s = siteinds("S=1/2", n)

  H⃗₀ = map(ℋ₀ -> MPO(ℋ₀, s), ℋ⃗₀)

  # Initial state, ψ₀ = ψ(0)
  # Initialize as complex since that is what OrdinaryDiffEq.jl/DifferentialEquations.jl
  # expects.
  ψ₀ = complex.(randomMPS(s, j -> isodd(j) ? "↑" : "↓"; linkdims=start_linkdim))

  @show norm(ψ₀)

  println()
  println("#"^100)
  println("Running TDVP with ODE updater")
  println("#"^100)
  println()

  ψₜ_ode = tdvp(
    -im * TimeDependentSum(f⃗, H⃗₀),
    time_stop,
    ψ₀;
    updater=ode_updater,
    updater_kwargs=(; reltol=tol, abstol=tol),
    time_step,
    maxdim,
    cutoff,
    nsite,
    outputlevel,
  )

  println()
  println("Finished running TDVP with ODE updater")
  println()

  println()
  println("#"^100)
  println("Running TDVP with Krylov updater")
  println("#"^100)
  println()

  ψₜ_krylov = tdvp(
    -im * TimeDependentSum(f⃗, H⃗₀),
    time_stop,
    ψ₀;
    updater=krylov_updater,
    updater_kwargs=(; tol, eager=true),
    time_step,
    cutoff,
    nsite,
    outputlevel,
  )

  println()
  println("Finished running TDVP with Krylov updater")
  println()

  println()
  println("#"^100)
  println("Running full state evolution with ODE updater")
  println("#"^100)
  println()

  @disable_warn_order begin
    ψₜ_full, _ = ode_updater(
      -im * TimeDependentSum(f⃗, contract.(H⃗₀)),
      contract(ψ₀);
      internal_kwargs=(; time_step=time_stop, outputlevel),
      reltol=tol,
      abstol=tol,
    )
  end

  println()
  println("Finished full state evolution with ODE updater")
  println()

  @show norm(ψₜ_ode)
  @show norm(ψₜ_krylov)
  @show norm(ψₜ_full)

  @show 1 - abs(inner(contract(ψₜ_ode), ψₜ_full))
  @show 1 - abs(inner(contract(ψₜ_krylov), ψₜ_full))
  return nothing
end

main()

But, when I pass psi and H as cuMPS and cuMPO, i.e.:

using ITensors
using ITensorMPS
using LinearAlgebra: norm
using Random: Random
using CUDA

include("03_models.jl")
include("03_updaters.jl")

function main()
  Random.seed!(1234)

  # Time dependent Hamiltonian is:
  # H(t) = H₁(t) + H₂(t) + …
  #      = f₁(t) H₁(0) + f₂(t) H₂(0) + …
  #      = cos(ω₁t) H₁(0) + cos(ω₂t) H₂(0) + …

  # Number of sites
  n = 6

  # How much information to output from TDVP
  # Set to 2 to get information about each bond/site
  # evolution, and 3 to get information about the
  # updater.
  outputlevel = 3

  # Frequency of time dependent terms
  ω₁ = 0.1
  ω₂ = 0.2

  # Nearest and next-nearest neighbor
  # Heisenberg couplings.
  J₁ = 1.0
  J₂ = 1.0

  time_step = 0.1
  time_stop = 1.0

  # nsite-update TDVP
  nsite = 2

  # Starting state bond/link dimension.
  # A product state starting state can
  # cause issues for TDVP without
  # subspace expansion.
  start_linkdim = 4

  # TDVP truncation parameters
  maxdim = 100
  cutoff = 1e-8

  tol = 1e-15

  @show n
  @show ω₁, ω₂
  @show J₁, J₂
  @show maxdim, cutoff, nsite
  @show start_linkdim
  @show time_step, time_stop

  ω⃗ = (ω₁, ω₂)
  f⃗ = map(ω -> (t -> cos(ω * t)), ω⃗)

  # H₀ = H(0) = H₁(0) + H₂(0) + …
  ℋ₁₀ = heisenberg(n; J=J₁, J2=0.0)
  ℋ₂₀ = heisenberg(n; J=0.0, J2=J₂)
  ℋ⃗₀ = (ℋ₁₀, ℋ₂₀)

  s = siteinds("S=1/2", n)

""" Define as H as cu(MPO) for GPU calculation """
  H⃗₀ = map(ℋ₀ -> cu(MPO(ℋ₀, s)), ℋ⃗₀)

  # Initial state, ψ₀ = ψ(0)
  # Initialize as complex since that is what OrdinaryDiffEq.jl/DifferentialEquations.jl
  # expects.
""" Define as psi as cu(MPS) for GPU calculation """
  ψ₀ = cu(complex.(randomMPS(s, j -> isodd(j) ? "↑" : "↓"; linkdims=start_linkdim)))

  @show norm(ψ₀)

  println()
  println("#"^100)
  println("Running TDVP with ODE updater")
  println("#"^100)
  println()

  ψₜ_ode = tdvp(
    -im * TimeDependentSum(f⃗, H⃗₀),
    time_stop,
    ψ₀;
    updater=ode_updater,
    updater_kwargs=(; reltol=tol, abstol=tol),
    time_step,
    maxdim,
    cutoff,
    nsite,
    outputlevel,
  )

  println()
  println("Finished running TDVP with ODE updater")
  println()

  println()
  println("#"^100)
  println("Running TDVP with Krylov updater")
  println("#"^100)
  println()

  ψₜ_krylov = tdvp(
    -im * TimeDependentSum(f⃗, H⃗₀),
    time_stop,
    ψ₀;
    updater=krylov_updater,
    updater_kwargs=(; tol, eager=true),
    time_step,
    cutoff,
    nsite,
    outputlevel,
  )

  println()
  println("Finished running TDVP with Krylov updater")
  println()

  println()
  println("#"^100)
  println("Running full state evolution with ODE updater")
  println("#"^100)
  println()

  @disable_warn_order begin
    ψₜ_full, _ = ode_updater(
      -im * TimeDependentSum(f⃗, contract.(H⃗₀)),
      contract(ψ₀);
      internal_kwargs=(; time_step=time_stop, outputlevel),
      reltol=tol,
      abstol=tol,
    )
  end

  println()
  println("Finished full state evolution with ODE updater")
  println()

  @show norm(ψₜ_ode)
  @show norm(ψₜ_krylov)
  @show norm(ψₜ_full)

  @show 1 - abs(inner(contract(ψₜ_ode), ψₜ_full))
  @show 1 - abs(inner(contract(ψₜ_krylov), ψₜ_full))
  return nothing
end

main()

I get the following error:

MethodError: no method matching (::var"#f#7"{…})(::CuArray{…}, ::SciMLBase.NullParameters, ::Float64)
An arithmetic operation was performed on a NullParameters object. This means no parameters were passed
into the AbstractSciMLProblem (e.x.: ODEProblem) but the parameters object `p` was used in an arithmetic
expression.
@emstoudenmire
Copy link
Contributor

Do you get the same error when running the calculation on CPU ie not using CUDA? I ask because the error message seems more related to SciML parameters than the presence of the CuArray though I could be wrong.

@YotamKa
Copy link
Author

YotamKa commented Jun 2, 2024

Hi, thanks for the quick response.

No, with a CPU backend (the first code snippet), the integrator works and produces the correct answers (compared with ED).

The output is:

n = 4
(ω₁, ω₂) = (0.1, 0.2)
(J₁, J₂) = (1.0, 1.0)
(maxdim, cutoff, nsite) = (100, 1.0e-8, 2)
start_linkdim = 4
(time_step, time_stop) = (0.1, 1.0)
norm(ψ₀) = 1.0000000000000002

####################################################################################################
Running TDVP with ODE updater
####################################################################################################

After sweep 1: maxlinkdim=4 maxerr=0.00E+00 current_time=0.1 time=0.701
After sweep 2: maxlinkdim=4 maxerr=8.05E-17 current_time=0.2 time=0.028
After sweep 3: maxlinkdim=4 maxerr=3.34E-17 current_time=0.3 time=0.025
After sweep 4: maxlinkdim=4 maxerr=0.00E+00 current_time=0.4 time=0.025
After sweep 5: maxlinkdim=4 maxerr=6.16E-17 current_time=0.5 time=0.023
After sweep 6: maxlinkdim=4 maxerr=2.07E-17 current_time=0.6 time=0.025
After sweep 7: maxlinkdim=4 maxerr=0.00E+00 current_time=0.7 time=0.023
After sweep 8: maxlinkdim=4 maxerr=0.00E+00 current_time=0.8 time=0.025
After sweep 9: maxlinkdim=4 maxerr=0.00E+00 current_time=0.9 time=0.023
After sweep 10: maxlinkdim=4 maxerr=0.00E+00 current_time=1.0 time=0.025

Finished running TDVP with ODE updater


####################################################################################################
Running TDVP with Krylov updater
####################################################################################################

After sweep 1-: maxlinkdim=4 maxerr=0.00E+00 current_time=0.1 time=5.487
After sweep 2: maxlinkdim=4 maxerr=2.14E-16 current_time=0.2 time=0.004
After sweep 3: maxlinkdim=4 maxerr=0.00E+00 current_time=0.3 time=0.004
After sweep 4: maxlinkdim=4 maxerr=3.33E-17 current_time=0.4 time=0.004
After sweep 5: maxlinkdim=4 maxerr=6.02E-18 current_time=0.5 time=0.004
After sweep 6: maxlinkdim=4 maxerr=7.39E-17 current_time=0.6 time=0.004
After sweep 7: maxlinkdim=4 maxerr=9.73E-17 current_time=0.7 time=0.004
After sweep 8: maxlinkdim=4 maxerr=0.00E+00 current_time=0.8 time=0.004
After sweep 9: maxlinkdim=4 maxerr=6.92E-17 current_time=0.9 time=0.006
After sweep 10: maxlinkdim=4 maxerr=0.00E+00 current_time=1.0 time=0.006

Finished running TDVP with Krylov updater


####################################################################################################
Running full state evolution with ODE updater
####################################################################################################


Finished full state evolution with ODE updater

norm(ψₜ_ode) = 1.0000000000000007
norm(ψₜ_krylov) = 1.0000000000000022
norm(ψₜ_full) = 0.9999999999999999
1 - abs(inner(contract(ψₜ_ode), ψₜ_full)) = -2.220446049250313e-16
1 - abs(inner(contract(ψₜ_krylov), ψₜ_full)) = 9.623815895309917e-7

@mtfishman
Copy link
Member

Interesting, thanks for the report. Maybe worth looking at https://github.com/SciML/DiffEqGPU.jl?tab=readme-ov-file#example-of-within-method-gpu-parallelism. The only difference I see in that example from our example is that we aren't defining an in-place version of the solver time stepping function: https://github.com/ITensor/ITensorMPS.jl/blob/v0.2.3/examples/03_updaters.jl#L11-L14, or there is something more subtle about how we are calling the ODE solver that I am missing.

@mtfishman
Copy link
Member

Also note that if you are using GPU, probably you want to use single precision (i.e. Float32 or Complex{Float32}). cu(...) converts to single precision automatically but you should probably change parameters in the code to single precision, for example ω₁ = 0.1f0, tol = 1f-7, etc. Something we do to easily switch between number types is define the number type at the top of the code and then use it as a variable throughout the code:

elt = Float32

ω₁ = one(elt) / 10
tol = 10 * eps(elt)

but of course feel free to handle that however you prefer.

@mtfishman
Copy link
Member

I see the issue, this line here is too restrictive: https://github.com/ITensor/ITensorMPS.jl/blob/v0.2.3/examples/03_updaters.jl#L12 and should allow any AbstractVector rather than just a Vector. I'll make a PR, and additionally make the example easier to run on GPU and select the element type.

@mtfishman
Copy link
Member

This is being fixed in #22.

@YotamKa
Copy link
Author

YotamKa commented Jun 3, 2024

Thanks a lot! It works also on my PC over a GPU backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants