-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ITensors] [ENHANCEMENT] Bump default value for eigsolve_krylovdim in DMRG to 4 #1234
Comments
Hmm maybe we could just add a note in the documentation about it? Choosing defaults is tough and always requires some tradeoffs. |
I would be in favour of the increased Krylov dimension, as this is not a standard parameter to be adjusted in the narrative of better convergence of DMRG algorithms such as the increased bond dimension or the reduced SVD cutoff (which is what newcomers adhere to), and this parameter feels a bit more buried behind a few layers of abstraction and is hard to find (particularily for newcomers). Surely, a note in the documentation is also helpful! |
In fact, the whooping 6 orders of magnitude improvement are not possible (in the critical Ising chain) with improving the bond dimension or the SVD cut off (or both)! So I think this has to be documented at least. |
@emstoudenmire any opinion on this? I think we would just need a more systematic analysis of a wider range of models to analyze the tradeoffs of accuracy vs. speed before making a decision. But adding a documentation note that increasing I do see that it is already mentioned here: https://itensor.github.io/ITensors.jl/dev/faq/DMRG.html#Ensuring-a-DMRG-calculation-is-converged |
Ah very good, I have missed that part of the documentation! The tradeoff vs accuracy certainly needs a much wider analysis than just this one example. |
Hi Jan, thanks for starting this discussion and providing a sample code for us to use. On my machine (M1 Macbook running Julia 1.10-rc1) I get rather different results, however, that are more in line with what we would hope, namely that results are quite good with Krylov dim of 3 and do get better with 4, but not dramatically so. Here's what I get: exact_crit_ising_energy_OBC(N) = -101.49740945239314
energy_3 = -101.49740580485742
energy_4 = -101.49740944736686
abs(energy_3 - exact_crit_ising_energy_OBC(N)) = 3.647535720574524e-6
abs(energy_4 - exact_crit_ising_energy_OBC(N)) = 5.026279836783942e-9
maxlinkdim(psi_3) = 20
maxlinkdim(psi_4) = 20 So I wonder what's accounting for the difference? Would you be willing to try Julia 1.10-rc1 to see if perhaps that's it? Otherwise it could be something like the available BLAS or LAPACK on your hardware versus mine (are you using the default BLAS?). It would be good to understand this especially if there's a set of defaults we could shift to that would give more consistent results across hardware and Julia/ITensor versions. I agree with Matt's point that we should gather more info first. And thanks for your PR improving the docs. |
Hi Miles, thanks for your reply! I have just ran the code on my PC with Julia 1.10-rc1 and the output is this:
What is absolutely bewildering to me is that your two ground states have I wonder what the hell is going on here 🤔 I read about some tricks about exchanging the BLAS system on an AMD CPU to MKL, and tested the following code where I used MKL on the AMD CPU I have by previously exporting MKL_DEBUG_CPU_TYPE=5, which sadly did not improve the convergence of the DMRG algorithm with Code with MKL
ENV["MKL_DEBUG_CPU_TYPE"] = "5"
using LinearAlgebra, MKL
using ITensors
# from Eq. (13) in T W Burkhardt and I Guim (1985), J. Phys. A: Math. Gen. 18 L33, https://doi.org/10.1088/0305-4470/18/1/006
exact_crit_ising_energy_OBC(N::Int) = 1.0 - csc(π/(2.0*(2.0*N+1)))
@time begin
J = 1.0
h = 1.0
N = 80
χ = 150
SVDcutoff = 1e-16
sites = siteinds("S=1/2",N)
ampo = OpSum()
for j=1:N-1
ampo += -J,"X",j,"X",j+1
ampo += -h,"Z",j
end
ampo += -h,"Z",N
H = MPO(ampo,sites)
psi0 = randomMPS(sites,10);
sweeps = Sweeps(12)
mindim!(sweeps, 5, 10, 20)
maxdim!(sweeps, floor.(Int, LinRange(10, χ, 10))...)
cutoff!(sweeps, SVDcutoff)
energy_3, psi_3 = dmrg(H,psi0, sweeps, eigsolve_krylovdim=3)
energy_4, psi_4 = dmrg(H,psi0, sweeps, eigsolve_krylovdim=4)
@show exact_crit_ising_energy_OBC(N)
@show energy_3
@show energy_4
@show abs(energy_3 - exact_crit_ising_energy_OBC(N))
@show abs(energy_4 - exact_crit_ising_energy_OBC(N))
@show maxlinkdim(psi_3)
@show maxlinkdim(psi_4)
end
julia> LinearAlgebra.BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
├ [ILP64] libmkl_rt.so
└ [ LP64] libmkl_rt.so
julia> versioninfo()
Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 on 32 virtual cores
Environment:
JULIA_DEPOT_PATH = /home/jan/.julia Which outputted the same behaviour as before when using OpenBLAS: exact_crit_ising_energy_OBC(N) = -101.49740945239314
energy_3 = -101.49671243345877
energy_4 = -101.49740945210006
abs(energy_3 - exact_crit_ising_energy_OBC(N)) = 0.0006970189343746824
abs(energy_4 - exact_crit_ising_energy_OBC(N)) = 2.930846676463261e-10
maxlinkdim(psi_3) = 150
maxlinkdim(psi_4) = 95 While testing, I found that I can occasionally get a good result even with 3 krylov vectors because of an advantageous random initial state, I suppose. I ran the @time begin ... end bracket for a couple of times and got one time the following output:
which makes sense in the context that this result is also achievable by increasing the number of sweeps for example up to 20, so I have to completely retract my above statement that a better result is not so easily achieved with The below output was generated only changing After sweep 1 energy=-101.40339650748034 maxlinkdim=10 maxerr=1.55E-03 time=0.050
After sweep 2 energy=-101.46031134289957 maxlinkdim=25 maxerr=2.36E-11 time=0.079
After sweep 3 energy=-101.4704113569282 maxlinkdim=41 maxerr=2.31E-12 time=0.343
After sweep 4 energy=-101.4764805892804 maxlinkdim=56 maxerr=7.78E-13 time=0.532
After sweep 5 energy=-101.4807265475735 maxlinkdim=72 maxerr=5.01E-13 time=0.858
After sweep 6 energy=-101.48403543732168 maxlinkdim=87 maxerr=1.97E-13 time=1.351
After sweep 7 energy=-101.48694783009688 maxlinkdim=103 maxerr=1.89E-13 time=1.614
After sweep 8 energy=-101.48951174716731 maxlinkdim=118 maxerr=1.06E-13 time=2.193
After sweep 9 energy=-101.4917851523758 maxlinkdim=134 maxerr=5.12E-14 time=2.472
After sweep 10 energy=-101.49368195192015 maxlinkdim=150 maxerr=2.78E-14 time=3.150
After sweep 11 energy=-101.49508127970724 maxlinkdim=150 maxerr=3.90E-14 time=3.381
After sweep 12 energy=-101.49601481073685 maxlinkdim=150 maxerr=3.46E-14 time=3.324
After sweep 13 energy=-101.49660647651778 maxlinkdim=150 maxerr=2.33E-14 time=3.042
After sweep 14 energy=-101.49696216413135 maxlinkdim=150 maxerr=1.61E-14 time=2.808
After sweep 15 energy=-101.4971729760927 maxlinkdim=150 maxerr=1.05E-14 time=2.698
After sweep 16 energy=-101.49728451701536 maxlinkdim=150 maxerr=6.33E-15 time=2.784
After sweep 17 energy=-101.49734165321406 maxlinkdim=150 maxerr=3.45E-15 time=2.518
After sweep 18 energy=-101.49737273784434 maxlinkdim=150 maxerr=1.86E-15 time=2.760
After sweep 19 energy=-101.49738987827969 maxlinkdim=150 maxerr=1.17E-15 time=2.542
After sweep 20 energy=-101.4973992694482 maxlinkdim=150 maxerr=7.37E-16 time=2.313
After sweep 1 energy=-101.37291163356562 maxlinkdim=10 maxerr=8.95E-04 time=0.063
After sweep 2 energy=-101.46779740896534 maxlinkdim=25 maxerr=6.49E-10 time=0.268
After sweep 3 energy=-101.4924910367722 maxlinkdim=41 maxerr=6.91E-11 time=0.448
After sweep 4 energy=-101.49653748673155 maxlinkdim=56 maxerr=4.15E-12 time=0.521
After sweep 5 energy=-101.49728144111938 maxlinkdim=72 maxerr=3.06E-13 time=0.810
After sweep 6 energy=-101.49738814570195 maxlinkdim=87 maxerr=4.45E-14 time=1.394
After sweep 7 energy=-101.49740498773751 maxlinkdim=103 maxerr=5.22E-15 time=1.305
After sweep 8 energy=-101.49740853693427 maxlinkdim=118 maxerr=1.01E-15 time=1.675
After sweep 9 energy=-101.49740929514599 maxlinkdim=133 maxerr=9.99E-17 time=1.571
After sweep 10 energy=-101.49740943000546 maxlinkdim=124 maxerr=9.99E-17 time=1.294
After sweep 11 energy=-101.49740945060206 maxlinkdim=106 maxerr=9.96E-17 time=1.153
After sweep 12 energy=-101.49740945231196 maxlinkdim=72 maxerr=9.95E-17 time=0.647
After sweep 13 energy=-101.49740945239259 maxlinkdim=64 maxerr=1.00E-16 time=0.434
After sweep 14 energy=-101.49740945239311 maxlinkdim=50 maxerr=9.98E-17 time=0.347
After sweep 15 energy=-101.49740945239294 maxlinkdim=50 maxerr=9.93E-17 time=0.309
After sweep 16 energy=-101.49740945239277 maxlinkdim=50 maxerr=9.77E-17 time=0.305
After sweep 17 energy=-101.49740945239287 maxlinkdim=50 maxerr=9.76E-17 time=0.317
After sweep 18 energy=-101.49740945239294 maxlinkdim=50 maxerr=9.76E-17 time=0.304
After sweep 19 energy=-101.49740945239249 maxlinkdim=50 maxerr=9.76E-17 time=0.555
After sweep 20 energy=-101.49740945239267 maxlinkdim=50 maxerr=9.76E-17 time=0.387
exact_crit_ising_energy_OBC(N) = -101.49740945239314
energy_3 = -101.4973992694482
energy_4 = -101.49740945239267
abs(energy_3 - exact_crit_ising_energy_OBC(N)) = 1.018294494770089e-5
abs(energy_4 - exact_crit_ising_energy_OBC(N)) = 4.689582056016661e-13
maxlinkdim(psi_3) = 150
maxlinkdim(psi_4) = 50 It becomes evident with more sweeps that the SVD compression reduces the critical Ising state further down to smaller bond dimension with a more efficient representation, but it is only able to do so in my case for |
Given that the energy converges for more sweeps and that we put a note into the documentation, I would close this issue. Perhaps interestingly, I have observed essentially the same behaviour of the convergence of the energy on the nvidia A100, meaning it only converges significantly closer to the true value with more than 15 sweeps and 4 Krylov vectors: Code
using ITensors
using CUDA
# from Eq. (13) in T W Burkhardt and I Guim (1985), J. Phys. A: Math. Gen. 18 L33, https://doi.org/10.1088/0305-4470/18/1/006
exact_crit_ising_energy_OBC(N::Int) = 1.0 - csc(π/(2.0*(2.0*N+1)))
platform = NDTensors.cu
# platform = identity
begin
J = 1.0
h = 1.0
N = 80
χ = 150
SVDcutoff = 1e-16
sites = siteinds("S=1/2",N)
ampo = OpSum()
for j=1:N-1
ampo += -J,"X",j,"X",j+1
ampo += -h,"Z",j
end
ampo += -h,"Z",N
H = platform(MPO(ampo,sites))
psi0 = platform(randomMPS(sites,10))
sweeps = Sweeps(20)
mindim!(sweeps, 5, 10, 20)
maxdim!(sweeps, floor.(Int, LinRange(10, χ, 10))...)
cutoff!(sweeps, SVDcutoff)
energy_3, psi_3 = dmrg(H,psi0, sweeps, eigsolve_krylovdim=3)
energy_4, psi_4 = dmrg(H,psi0, sweeps, eigsolve_krylovdim=4)
@show exact_crit_ising_energy_OBC(N)
@show energy_3
@show energy_4
@show abs(energy_3 - exact_crit_ising_energy_OBC(N))
@show abs(energy_4 - exact_crit_ising_energy_OBC(N))
@show maxlinkdim(psi_3)
@show maxlinkdim(psi_4)
end Output: exact_crit_ising_energy_OBC(N) = -101.49740945239314
energy_3 = -101.4973974694192
energy_4 = -101.49740945239256
abs(energy_3 - exact_crit_ising_energy_OBC(N)) = 1.1982973944668629e-5
abs(energy_4 - exact_crit_ising_energy_OBC(N)) = 5.826450433232822e-13
maxlinkdim(psi_3) = 150
maxlinkdim(psi_4) = 50 This is obviously nothing dramatic, so I would just close this issue, given that the documentation also contains the note to be careful around critical points. Technical output
julia> LinearAlgebra.BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libopenblas64_.so
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 56 × Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, icelake-server)
Threads: 5 on 56 virtual cores
Environment:
LD_LIBRARY_PATH = /dragofs/sw/foss/0.2/software/ncurses/6.2-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/binutils/2.37-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/zlib/1.2.11-GCCcore-11.2.0/lib:/dragofs/sw/foss/0.2/software/GCCcore/11.2.0/lib64
JULIA_NUM_THREADS = 4
julia> CUDA.current_device()
CuDevice(0): NVIDIA A100-PCIE-40GB |
Description
In light of the spectacularly improved convergence of the ground state energy of a critical spin chain, I would suggest increasing the default dimension of the Krylov subspace in the eigensolver from KrylovKit to 4
Minimal runnable code
Output
Output of minimal runnable code
As you can see, the DMRG converges up to 4 orders of magnitude when using 3 Krylov vectors, while it converges to a spectacular 6 orders of magnitude more accurately (1e-10 absolutely) to the true ground state energy when using 4 Krylov vectors. Note also that this is not a matter of bond dimension or entanglement, as both states have the same SVD cutoff and maximum link dimension. Bizarrely, the ground state of the algorithm with 4 krylov vectors has a lower bond dimension 🤔
In this vain, one might decide on the layout as you discussed in ITensor/ITensorMPS.jl#49
Note that there is a performance cost of ≈ 16% when running the code away from the critical point. I tested h/J=0.5 and got the following timer values: (note that H is now different), while there is arguably no accuracy improvement...
Now, clearly this is a design decision on your side. I suppose "only" using 3 times as many Krylov basis vectors as eigenvalues one is asking for (in this case the first eigenvalue, i.e. of the ground state) is typically on the lower end of the recommended factor of Krylov vectors to use in an Arnoldi eigensolver [citation needed].
Let me know what you think of this design choice as well as the one in ITensor/ITensorMPS.jl#49
Cheers,
Version information
versioninfo()
:using Pkg; Pkg.status("ITensors")
:The text was updated successfully, but these errors were encountered: