Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised calcHTranspose operator #68

Merged
merged 16 commits into from
Jul 15, 2024
Merged

Revised calcHTranspose operator #68

merged 16 commits into from
Jul 15, 2024

Conversation

johnmauff
Copy link
Collaborator

@johnmauff johnmauff commented Jul 10, 2024

This PR represents a complete rewrite of the calcHTranspose operator. The previous implementation only stored the H matrix and accessed it using several indirect addressing arrays for the calcHTranspose operation. The previous storage format was non-standard and on large problems like the hurricane_4panel test case actually consumed more memory than just explicitly storing the H^t matrix. This PR changes the form of the calcHTranspose operator such that the H^t matrix is explicitly stored in a CSR format. This change enables both a reduction in memory usage for computationally large problems and execution time on the CPU. Due to reduction in memory usage, both the hurricane_4panel configuration can be run on 80GB H100 GPU's and the hurricane case can be run on a 40 GB A100 GPU without code changes. The version of the code has been tested on the following platforms and input configurations:

Derecho CPU

beltrami
supercell
hurricane
typhoonChanthu2020
hurricane_4panel

Derecho A100 GPU

beltrami
supercell
hurricane
typhoonChanthu2020

Casper H100 GPU

beltrami
supercell
hurricane
typhoonChanthu2020
hurricane_4panel

Copy link
Collaborator

@sjsprecious sjsprecious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johnmauff for working on this. It is still unclear to me what you have done to reduce the memory usage but I have some clarification questions first.

src/CostFunction3D.cpp Outdated Show resolved Hide resolved
src/CostFunction3D.cpp Show resolved Hide resolved
src/CostFunction.h Show resolved Hide resolved
src/CostFunction3D.cpp Show resolved Hide resolved
src/CostFunction3D.cpp Outdated Show resolved Hide resolved
src/CostFunction3D.cpp Outdated Show resolved Hide resolved
Comment on lines 157 to 158
uint64_t *IH; // uint64_t
uint32_t *JH; // uint32_t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why shall we define *IH and *JH with different types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question and this relates to your overall question as to why these changes save memory. The JH array can have values that range from (0,nstate-1) while IH can have values that range from (0,nnz-1). While nnz exceeds the threshold for 32-bit integers, nState does not. So we can save memory here by using 32-bit inttegers for the JH array which has a total number of elements of nnz. The previous implementation had multiple 64-bit integer index arrays that had nnz elements. I have added comments to the CostFunction3D.h to help clarify the differences in word size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I see. Thanks for your clarification and it now makes sense. As long as nState won't exceed the threshold for 32-bit integer in the future SAMURAI production test, this change seems safe to me.

Copy link
Collaborator

@sjsprecious sjsprecious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johnmauff for addressing my comments. Here is the second round of my questions.

src/CostFunction.h Show resolved Hide resolved
src/CostFunction3D.cpp Show resolved Hide resolved
src/CostFunction3D.cpp Show resolved Hide resolved
src/CostFunction3D.cpp Show resolved Hide resolved
src/CostFunction3D.cpp Show resolved Hide resolved
Copy link
Collaborator

@sjsprecious sjsprecious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johnmauff for addressing all my comments.

I could confirm that I am able to run a 4-panel hurricane case on Casper's H100 GPU with your branch (it takes about 2 hours in total and runtime: Cost3D minimize: 5222).

One minor and optional suggestion: I prefer to merge my PR #66 first (still waiting for @mmbell 's review) so that we can merge it into your PR and test if it works as expected.

@johnmauff
Copy link
Collaborator Author

@sjsprecious Thanks for your thorough review. I am fine with waiting for PR #66 to be committed first.

Copy link
Owner

@mmbell mmbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making these changes and for the thorough discussion about the reasoning.

Copy link

codecov bot commented Jul 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 32.62%. Comparing base (12e3fc1) to head (1367543).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #68   +/-   ##
=======================================
  Coverage   32.62%   32.62%           
=======================================
  Files          51       51           
  Lines       16815    16815           
=======================================
  Hits         5486     5486           
  Misses      11329    11329           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sjsprecious sjsprecious merged commit 6e2fd8e into main Jul 15, 2024
4 checks passed
@sjsprecious sjsprecious deleted the revised-Ht branch July 15, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants