Skip to content

Commit

Permalink
Merge branch 'main' into kmp5/debug/cuda_rand
Browse files Browse the repository at this point in the history
  • Loading branch information
kmp5VT authored Sep 26, 2023
2 parents fa67f7c + 10f7eee commit db0a4fc
Showing 1 changed file with 67 additions and 6 deletions.
73 changes: 67 additions & 6 deletions docs/src/faq/HPC.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,36 @@

## My code is using a lot of RAM - what can I do about this?

Tensor network algorithms can often use a large amount of RAM. However, on top
of this essential fact, the Julia programming languge is also "garbage collected"
Tensor network algorithms can often use a large amount of RAM. On top
of this essential fact, the Julia programming languge is "garbage collected"
which means that unused memory isn't given back to the operating system right away,
but only on a schedule determined by the Julia runtime. In cases where you code
allocates a lot of memory very quickly, this can lead to high memory usage.
but only when the Julia runtime dynamically reclaims it. When your code
allocates memory very rapidly, this can lead to high memory usage overall.

Fortunately, one simple step you can take to potentially help with this is to pass
Fortunately there are various steps you can take to keep the memory usage of your code under control.

### 1. Avoid Repeatedly Allocating, Especially in Fast or "Hot" Loops

More memory gets used whenever your code "allocates", which happens most commonly
when you use dynamic storage types like `Vector` and `Matrix`. If you have a code
pattern where you allocate or resize an array or vector inside a 'hot' loop,
meaning a loop that iterates quickly very many times, the memory from the previous
allocations may pile up very quickly before the next garbage collector run.

To avoid this, allocate the array once before the loop begins if possible,
then overwrite its contents during each iteration. More generally, try as much as
possible to estimate the sizes of dynamic resources ahead of time. Or do one allocation
that creates a large enough "workspace" that dynamic algorithms can reuse part of without
reallocating the whole workspace (i.e. making a large array once then using portions of it
when smaller arrays are needed).

### 2. Use the `--heap-size-hint` Flag

A simple step you can take to help with overall memory usage is to pass
the `--heap-size-hint` flag to the Julia program when you start it. For example,
you can call Julia as:
```
julia --heap-size-hint=100G
julia --heap-size-hint=60G
```
When you pass this heap size, Julia will try to keep the memory usage at or below this
value if possible.
Expand All @@ -25,3 +44,45 @@ bond dimensions, which may fundamentally use a lot of memory. In those cases, yo
try to use features such as "write to disk mode" of the ITensor DMRG code or other related
techniques. (See the `write_when_maxdim_exceeds` keyword of the ITensor `dmrg` function.)


### 3. In Rare Case, Force a Garbage Collection Run

In some rare cases, such as when your code cannot be optimized to avoid any more allocations
or when the `--heap-size-hint` provided above is not affecting the behavior of the Julia
garbage collector, you can force the garbage collector (GC) to run at a specific point
in your code by calling:
```
GC.gc()
```
Alternatively, you can call `GC.gc(true)` to force a "full run" rather than just collecting
a more 'young' subset of previous allocations.

While this approach works well to reduce memory usage, it can have the unfortunate downside
of slowing down your code each time the garbage collector runs, which can be especially
harmful to multithreaded or parallel algorithms. Therefore, if this approach must be used
try calling `GC.gc()` as infrequently as possible and ideally only in the outermost functions
and loops of your code (highest levels of your code).


## Can Julia Be Used to Perform Parallel, Distributed Calculations on Large Clusters?

Yes. The Julia ecosystem offers multiple approaches to parallel computing across multiple
machines including on large HPC clusters and including GPU resources.

For an overall view of some of these options, the [Julia on HPC Clusters](https://juliahpc.github.io/JuliaOnHPCClusters/) website is a good resource.

Some of the leading approaches to parallelism in Julia are:
* MPI, through the [MPI.jl](https://juliaparallel.org/MPI.jl/latest/) package. Has the advantage of optionally using an MPI backend that is optimized for a particular cluster and possibly using fast interconnects like Infiniband.
* [Dagger](https://juliaparallel.org/Dagger.jl/dev/), a framework for parallel computing across all kinds of resources, like CPUs and GPUs, and across multiple threads and multiple servers.
* [Distributed](https://docs.julialang.org/en/v1/stdlib/Distributed/). Part of the base Julia library, giving tools to perform calculations distributed across multiple machines.


## Does My Cluster Admin Have to Install Julia for Me? What are the Best Practices for Installing Julia on Clusters?

The most common approach to installing and using Julia on clusters is for users to install their own Julia binary and dependencies, which is quite easy to do. However, for certain libraries like MPI.jl, there may be MPI backends that are preferred by the cluster administrator. Fortunately, it is possible for admins to set global defaults for such backends and other library preferences.

For more information on best practices for installing Julia on clusters, see the [Julia on HPC Clusters](https://juliahpc.github.io/JuliaOnHPCClusters/) website.




0 comments on commit db0a4fc

Please sign in to comment.