ChASE v1.4.0. Major release.
Introduced a new distributed GPU-build of ChASE entirely based on the NVIDIA NCCL library, which avoids the explicit data movement between host and device memory, and leads to much faster collective communications among the involved GPUs. This new release achieves between a 1.5x and 3x with respect to the traditional distributed multi-GPUs build. Now ChASE can be compiled and executed with the following distinct parallel configurations:
Distributed CPU only
Distributed multi-GPUs (traditionally based on host-device communication standards)
Distributed multi-GPUs (using NVIDIA NCCL library)