-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ascent hanging with MPI #1384
Comments
@FrankFrank9 can you share what actions you are using? Are you passing the mpi comm handle id as an option during |
Hi @cyrush, thanks for reaching out. I'm using PyFR where
In PyFR this is the mpi passed to ascent Any possible idea? Also does it matter specifying the vtkm backend? Best regards |
We do provide and test python modules for ascent and conduit. There are some extra checks in there with respect to MPI vs non mpi, however since you are directly using The backend should not matter. It is possible you have an error on an mpi task and we aren't seeing it. Can you try the following:
That will allow exceptions to flow up, and will likely crash instead of hang. When compiling ascent with |
I tried it but I don't get exceptions. Is it the correct syntax to impose?
I did that but the hanging still persists and I also tried installing with An update on this: the hanging is due to the execute function, specifically |
Sorry that did not help us. Can you share your actions? Also, can you try running a very simple action:
This will create an yaml file (if successful) that might help us. |
Thanks a lot for the help you're providing. After Hope this can be useful. Even if exceptions forward is active nothing happens and no errors are thrown. (On 1 rank locally exceptions are correctly forwarded) |
@cyrush |
Sorry this mystery continues. I see some nans in the some of our camera info outputs - but I don't think that would be the source of a hang. When it hangs, do you get any of the three images you are trying to render (trying to narrow down where to look). Can you share how many MPI tasks + job nodes? Can we coach you to a set of HDF5s out via an Ascent extract and see if we can reproduce? |
It is 80 MPI tasks over 10 nodes. But this happens whenever I use more than 1 node.
To be honest it seems more or less random. But I noticed that it is less frequent of those scenes when only 1 render is called. Is there any MPI blocking operation when multiple renderers are triggered on a scene?
Yes sure, let me know |
Here is an example ascent_actions.yaml for generating an extract of the data.
This should generate a root file and a folder of hdf5 files (or just the root file if small enough). Then we can hopefully use this extract to replicate your error. |
Hello,
I have the issue where ascent hangs my simulation when running with MPI on multiple cluster nodes.
I compile with:
env enable_mpi=ON ./build_ascent.sh
Did this ever happened before?
Do you have any recommendation?
Best
The text was updated successfully, but these errors were encountered: