Memory policy #983

nksauter · 2024-04-09T23:14:40Z

I've created a failing unit test for new kokkos code in the cuda-context.

If the test "libtbx.python cctbx_project/simtbx/tests/tst_memory_policy.py" can be made to work, the bug has been fixed.

Background information: I'm trying to extend the kokkos exascale_api with new behavior. In the old way, it would allocate large arrays on GPU corresponding to the detector size even if only a few pixels are calculated due to the whitelist (a list of pixels of interest). In the new behavior only enough memory is allocated for the whitelist pixels. I am using C++ templates with template specialization dependent on these two memory-allocation cases. I need this to work so Daniel can move ahead with the SPREAD project, and it is just this last detail that I cannot seem to fix.

dermen · 2024-05-09T15:53:04Z

@nksauter , just making sure I see where we are: in the latest commit, the test works, but there is duplicated code which we want to avoid ?

nksauter · 2024-05-09T16:36:52Z

@dermen stand by please with regard to the working tests. I'm going to apply the test to additional cases today to double check. The duplicated code is a serious problem, as my original intent was to implement polymorphism with the minimal lines of code, instead I had to duplicate an entire kernel. Further, I was unable to shrink the size of the m_accumulate_floatimage array as was the original intent.

…

On Thu, May 9, 2024 at 8:53 AM Derek Mendez ***@***.***> wrote: @nksauter <https://github.com/nksauter> , just making sure I see where we are: in the latest commit, the test works, but there is duplicated code which we want to avoid ? — Reply to this email directly, view it on GitHub <#983 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ24VTVVOZLU4RA3OPDZETZBOLXNAVCNFSM6AAAAABF7NPLTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSHE2DAMJXGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Baharis

After 3 weeks of testing via psii_spread's annulus worker, this PR was indeed found to significantly lower the memory footprint when simulating images on GPU. I don't have particular issues with the functionality of this code.

In the exascale_api, allow pixel values to be calculation either on large array (all pixels), or with low-memory on just the whitelist consisting of shoebox pixels. This commit only gives the polymorphism framework; both implementations are currently identical giving the large-array behavior.

The script tests/tst_memory_policy.py fails with a cuda illegal access. The intention is to get help from NESAP to get a functional test.

Memory savings achieved through code specialization, for the case where pixel values are simulated on a small whitelist. Specializations are not yet optimal, as there is still a lot of code duplication. Changes give ~4.5x reduction in memory footprint, but no success yet in resizing the array m_accumulate_floatimage. Attempts so far lead to cuda memory allocation error.

nksauter requested review from JBlaschke and Baharis April 9, 2024 23:14

phyy-nx force-pushed the memory_policy branch from 98d8720 to 510bc4d Compare April 10, 2024 16:22

nksauter requested a review from dermen April 24, 2024 18:49

Baharis force-pushed the memory_policy branch from dcb444f to 6d7a2d7 Compare May 20, 2024 22:24

Baharis approved these changes May 31, 2024

View reviewed changes

nksauter force-pushed the memory_policy branch 2 times, most recently from edb7ebb to 7e5ae72 Compare June 15, 2024 00:37

nksauter added 4 commits November 16, 2024 08:33

Debug case for non-working test.

79de5b7

The script tests/tst_memory_policy.py fails with a cuda illegal access. The intention is to get help from NESAP to get a functional test.

Remove debugging output to conserve stdout size.

fa0295b

nksauter force-pushed the memory_policy branch from 7e5ae72 to fa0295b Compare November 16, 2024 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory policy #983

Memory policy #983

nksauter commented Apr 9, 2024 •

edited

Loading

dermen commented May 9, 2024

nksauter commented May 9, 2024 via email

Baharis left a comment

Memory policy #983

Are you sure you want to change the base?

Memory policy #983

Conversation

nksauter commented Apr 9, 2024 • edited Loading

dermen commented May 9, 2024

nksauter commented May 9, 2024 via email

Baharis left a comment

Choose a reason for hiding this comment

nksauter commented Apr 9, 2024 •

edited

Loading