Revised calcHTranspose operator #68

johnmauff · 2024-07-10T21:59:56Z

This PR represents a complete rewrite of the calcHTranspose operator. The previous implementation only stored the H matrix and accessed it using several indirect addressing arrays for the calcHTranspose operation. The previous storage format was non-standard and on large problems like the hurricane_4panel test case actually consumed more memory than just explicitly storing the H^t matrix. This PR changes the form of the calcHTranspose operator such that the H^t matrix is explicitly stored in a CSR format. This change enables both a reduction in memory usage for computationally large problems and execution time on the CPU. Due to reduction in memory usage, both the hurricane_4panel configuration can be run on 80GB H100 GPU's and the hurricane case can be run on a 40 GB A100 GPU without code changes. The version of the code has been tested on the following platforms and input configurations:

Derecho CPU

beltrami
supercell
hurricane
typhoonChanthu2020
hurricane_4panel

Derecho A100 GPU

beltrami
supercell
hurricane
typhoonChanthu2020

Casper H100 GPU

beltrami
supercell
hurricane
typhoonChanthu2020
hurricane_4panel

sjsprecious

Thanks @johnmauff for working on this. It is still unclear to me what you have done to reduce the memory usage but I have some clarification questions first.

src/CostFunction3D.cpp

src/CostFunction.h

src/CostFunction3D.cpp

sjsprecious · 2024-07-11T03:20:19Z

src/CostFunction3D.h

+	uint64_t *IH; // uint64_t
+        uint32_t *JH; // uint32_t


why shall we define *IH and *JH with different types?

Good question and this relates to your overall question as to why these changes save memory. The JH array can have values that range from (0,nstate-1) while IH can have values that range from (0,nnz-1). While nnz exceeds the threshold for 32-bit integers, nState does not. So we can save memory here by using 32-bit inttegers for the JH array which has a total number of elements of nnz. The previous implementation had multiple 64-bit integer index arrays that had nnz elements. I have added comments to the CostFunction3D.h to help clarify the differences in word size.

Aha, I see. Thanks for your clarification and it now makes sense. As long as nState won't exceed the threshold for 32-bit integer in the future SAMURAI production test, this change seems safe to me.

sjsprecious

Thanks @johnmauff for addressing my comments. Here is the second round of my questions.

src/CostFunction.h

src/CostFunction3D.cpp

sjsprecious

Thanks @johnmauff for addressing all my comments.

I could confirm that I am able to run a 4-panel hurricane case on Casper's H100 GPU with your branch (it takes about 2 hours in total and runtime: Cost3D minimize: 5222).

One minor and optional suggestion: I prefer to merge my PR #66 first (still waiting for @mmbell 's review) so that we can merge it into your PR and test if it works as expected.

johnmauff · 2024-07-11T19:52:01Z

@sjsprecious Thanks for your thorough review. I am fine with waiting for PR #66 to be committed first.

mmbell

Thank you for making these changes and for the thorough discussion about the reasoning.

codecov · 2024-07-15T19:48:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 32.62%. Comparing base (12e3fc1) to head (1367543).

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #68   +/-   ##
=======================================
  Coverage   32.62%   32.62%           
=======================================
  Files          51       51           
  Lines       16815    16815           
=======================================
  Hits         5486     5486           
  Misses      11329    11329

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

johnmauff added 13 commits July 2, 2024 13:46

Initial version with revised H^t operator

a4b45fd

Next step in Memory reduction.

bb4b7c6

More explicit definitions of ints.

9933c80

Code cleanup for new CSR format for H^t matrix

ac504c2

Verified that this works for the hurricane_4panel case.

c9a61af

More bug fixes for new-H^t operator.

7b826dd

Reduced int length for JH array

5bb4f40

Change integer size

32ad127

Cleanup calculation of nnz

1bdea2a

Removed remaining old H^t code.

a6e2f5b

Minor cleanup of the code.

2c22b05

Merge remote-tracking branch 'origin/main' into revised-Ht

033a6cf

Cleanup

720e92c

johnmauff requested review from mmbell, sjsprecious and cenamiller July 10, 2024 22:00

sjsprecious requested changes Jul 11, 2024

View reviewed changes

sjsprecious assigned johnmauff Jul 11, 2024

sjsprecious added the enhancement label Jul 11, 2024

sjsprecious added this to the SAMURAI V3 with Terrain Optimization milestone Jul 11, 2024

Clarification on the size of int used in the code.

c4b683b

sjsprecious reviewed Jul 11, 2024

View reviewed changes

src/CostFunction.h Show resolved Hide resolved

src/CostFunction3D.cpp Show resolved Hide resolved

src/CostFunction3D.cpp Show resolved Hide resolved

Fixed OpenACC directive

2e5fb52

sjsprecious reviewed Jul 11, 2024

View reviewed changes

src/CostFunction3D.cpp Show resolved Hide resolved

src/CostFunction3D.cpp Show resolved Hide resolved

sjsprecious approved these changes Jul 11, 2024

View reviewed changes

mmbell approved these changes Jul 15, 2024

View reviewed changes

Merge branch 'main' into revised-Ht

1367543

sjsprecious merged commit 6e2fd8e into main Jul 15, 2024
4 checks passed

sjsprecious deleted the revised-Ht branch July 15, 2024 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised calcHTranspose operator #68

Revised calcHTranspose operator #68

johnmauff commented Jul 10, 2024 •

edited

Loading

sjsprecious left a comment

sjsprecious Jul 11, 2024

johnmauff Jul 11, 2024

sjsprecious Jul 11, 2024

sjsprecious left a comment

sjsprecious left a comment

johnmauff commented Jul 11, 2024

mmbell left a comment

codecov bot commented Jul 15, 2024

Revised calcHTranspose operator #68

Revised calcHTranspose operator #68

Conversation

johnmauff commented Jul 10, 2024 • edited Loading

Derecho CPU

Derecho A100 GPU

Casper H100 GPU

sjsprecious left a comment

Choose a reason for hiding this comment

sjsprecious Jul 11, 2024

Choose a reason for hiding this comment

johnmauff Jul 11, 2024

Choose a reason for hiding this comment

sjsprecious Jul 11, 2024

Choose a reason for hiding this comment

sjsprecious left a comment

Choose a reason for hiding this comment

sjsprecious left a comment

Choose a reason for hiding this comment

johnmauff commented Jul 11, 2024

mmbell left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 15, 2024

Codecov Report

johnmauff commented Jul 10, 2024 •

edited

Loading