Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to understand binary dumps when data sizes straddle 2^31 bytes #404

Open
jmstone opened this issue Dec 2, 2023 · 1 comment
Open

Comments

@jmstone
Copy link
Collaborator

jmstone commented Dec 2, 2023

In GitLab by @c-white on Dec 2, 2023, 10:57

I just want to check that there isn't a subtle possibility of a hanging bug. Consider the code for outputting full dumps:

https://gitlab.com/theias/hpc/jmstone/athena-parthenon/athenak/-/blob/master/src/outputs/binary.cpp#L226-261

If the rank has more than 2^31 bytes of data to write, it will break up the write and do 1 MeshBlock at a time, but this decision is based on a local value. What if rank 0 has 3 MeshBlocks, pushing it over the 2^31 threshold, but rank 1 has 2 MeshBlocks, keeping it under? It seems rank 1 will take the first branch and trigger a single MPI_file_write_at_all, while rank 0 will take the second branch. Rank 0 will then trigger 2 MPI_file_write_at_all calls followed by a single MPI_file_write_at for its 3 MeshBlocks. It seems this will hang with an unbalanced number of collective writes.

@jmstone
Copy link
Collaborator Author

jmstone commented Feb 25, 2024

In GitLab by @jmstone216 on Feb 25, 2024, 12:11

As an update to this issue, the binary files should be writing Reals, not bytes so as to increase the maximum size that can be written, and to get rid of what looks like an unnecessary memcopy. This should be done as part of an overall update of the IOWrapper class to read/write any_type, which was started when particle outputs were added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant