Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zsgpu #14

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from
Draft

Zsgpu #14

wants to merge 24 commits into from

Conversation

joshkamm
Copy link
Member

@joshkamm joshkamm commented Nov 5, 2024

Paul plans to work on this branch for some time and use it as the dependency for his replacement of FancyElectrons. It is being kept separate because Paul's changes likely will not continue to be compatible with FancyElectrons. Once the replacement is complete, the plan is to merge this pr and pin FancyElectrons to a commit before this pr was merged. Ultimately, Paul plans to migrate all FancyElectrons usage in the group to his replacement.

I'm creating this pr well before it's ready to merge as a space for notes and to monitor progress.

Closes #16

@joshkamm joshkamm self-assigned this Nov 5, 2024
Note that some USE_ACC, especially in controlling the multigpu code, are necessary. Otherwise, a non-GPU compile will behave quite badly. So, those remain.
This is a significant update that attempted to fix the multiGPU parallel operations. It did not succeed, but the async functionality may still be worthwhile to use later on. 

Other changes over the last month as I evolved the integral code for Jellium/etc are contained here too.
@paulzim46
Copy link
Contributor

It would be nice to fix the multiGPU operation with the latest nvidia compilers, but I also suspect we just have a compiler bug on our hands. Versions 24.9 and 24.11 do not work; 20.7 is fine.

@joshkamm
Copy link
Member Author

Hmm @paulzim46 do you think it's worth trying to sufficiently isolate it to provide the compiler team with a lead if it is a bug?

While working on the infrastructure, nvhpc has also been the most difficult dependency to manage. It seems like their licensing makes it difficult for a package manager to redistribute.

I've noticed that gcc also has openacc support which may be easier to manage from an infrastructure angle, but I don't know whether it's as reliable and performant in general, or how many subtle differences there are between them.

@joshkamm joshkamm linked an issue Jan 6, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix multiple gpu support Remove USE_ACC
2 participants