[NDTensors][NDTensorsCUDAExt] Improve performance of GPU backends #1194

kmp5VT · 2023-09-13T21:19:37Z

This PR is to address the issue #1193. In order to avoid scalar operations, fill tensors with rand or zeros on device.

codecov-commenter · 2023-09-13T22:51:32Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (3cdba94) 85.37% compared to head (b7743bb) 67.34%.
Report is 2 commits behind head on main.

❗ Current head b7743bb differs from pull request most recent head 262e924. Consider uploading reports for the commit 262e924 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1194       +/-   ##
===========================================
- Coverage   85.37%   67.34%   -18.03%     
===========================================
  Files          88       88               
  Lines        8416     8388       -28     
===========================================
- Hits         7185     5649     -1536     
- Misses       1231     2739     +1508

Files	Coverage Δ
src/ITensors.jl	`100.00% <ø> (ø)`
src/broadcast.jl	`87.83% <100.00%> (+0.94%)`	⬆️
src/itensor.jl	`80.99% <100.00%> (-1.26%)`	⬇️
src/tensor_operations/permutations.jl	`96.77% <100.00%> (ø)`
src/mps/dmrg.jl	`61.94% <66.66%> (-22.16%)`	⬇️
src/set_types.jl	`0.00% <0.00%> (ø)`

... and 34 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

NDTensors/ext/NDTensorsCUDAExt/fill.jl

NDTensors/src/abstractarray/fill.jl

…into kmp5/debug/cuda_rand

NDTensors/ext/NDTensorsCUDAExt/linearalgebra.jl

NDTensors/test/arraytensor/array.jl

NDTensors/src/abstractarray/permutedims.jl

NDTensors/src/dense/densetensor.jl

NDTensors/src/array/mul.jl

NDTensors/src/array/permutedims.jl

NDTensors/src/linearalgebra/linearalgebra.jl

Co-authored-by: Matt Fishman <[email protected]>

kmp5VT added 4 commits September 13, 2023 17:11

Create generic randn for CUDA and use randn to make on device

6438473

Make generic_zero for CUDA

682bcf3

Format

4f879d5

remove monorepo

b567ecd

mtfishman reviewed Sep 14, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/fill.jl Outdated Show resolved Hide resolved

kmp5VT added 2 commits September 14, 2023 10:35

Convert tests to CPU to not perform scalar operations

a4baeb7

import -> using. Can use NDTenosrs randn! instead of another function

03a9b27

mtfishman reviewed Sep 14, 2023

View reviewed changes

NDTensors/src/abstractarray/fill.jl Outdated Show resolved Hide resolved

kmp5VT marked this pull request as draft September 14, 2023 22:57

kmp5VT added 19 commits September 14, 2023 18:57

Merge branch 'main' into kmp5/debug/cuda_rand

538cf76

Merge branch 'main' into kmp5/debug/cuda_rand

7314b2f

Temporarily get contract working by making a mul!! function

154fac2

format

3da5e96

remove change to combiner

cb36f23

Use adapt to not copy elements

aef93a5

Remove commented code

89d2837

Remove CUDA fill functions and use more generic in abstractarray/fill.jl

3e93d61

Remove NDTensors.

6b42956

format

fe470b7

Add comment about gpus

65d7174

Do elementwise operations on data to avoid scalar indexing

7232d31

Force dot to return value on CPU

6197213

remove copy code

0ed96cb

format

fa67f7c

Merge branch 'main' into kmp5/debug/cuda_rand

db0a4fc

Bootleg fix for a conversion to UnifiedMemory issue

674c40c

force itensor to use data to speed up computation

79ec79f

Merge branch 'kmp5/debug/cuda_rand' of github.com:kmp5VT/ITensors.jl …

3a142b4

…into kmp5/debug/cuda_rand

mtfishman reviewed Sep 27, 2023

View reviewed changes

NDTensors/ext/NDTensorsCUDAExt/linearalgebra.jl Outdated Show resolved Hide resolved

kmp5VT added 3 commits October 13, 2023 17:02

Permute in place

211eb84

Call NDTensors.permute in tests

23e7659

Format

bf82369

mtfishman reviewed Oct 13, 2023

View reviewed changes

NDTensors/test/arraytensor/array.jl Outdated Show resolved Hide resolved

mtfishman reviewed Oct 13, 2023

View reviewed changes

NDTensors/src/abstractarray/permutedims.jl Outdated Show resolved Hide resolved

kmp5VT added 5 commits October 14, 2023 18:53

Make base.permutedims(Tensor) =NDTensors.permutedims

6445af6

Make permutedims match bangbang code flow

2c18c13

Have mul call mul! then return

f1a1b61

format

b5fd9d1

Use base.permutedims not NDTensors

50eb9a8

kmp5VT marked this pull request as ready for review October 16, 2023 15:34

kmp5VT added 2 commits October 16, 2023 16:14

Use simplified functions to dispatch later on parenttype

bf86c83

format

a836b89