-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix in distributed GPU tests and Distributed set!
#3880
Merged
Merged
Changes from 39 commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
5227a13
fix pipeline
simone-silvestri 1225061
mpi test and gpu test
simone-silvestri 1652c6b
do we need to precompile it inside?
simone-silvestri 9323203
precompile inside the node
simone-silvestri 37b17ff
try previous climacommon version
simone-silvestri 2ac8cde
go even more back
simone-silvestri 0eb2720
use the ClimaOcean implementation
simone-silvestri 50d0ec3
using the ClimaOcean implementation
simone-silvestri 6e183bd
see if this test passes
simone-silvestri bd84d38
Merge branch 'main' into ss/fix-gpu-tests
simone-silvestri c56b15b
maybe precompiling before...
simone-silvestri 371a45b
Merge branch 'ss/fix-gpu-tests' of github.com:CliMA/Oceananigans.jl i…
simone-silvestri e30973f
double O0
simone-silvestri e4cb16e
back to previous clima_common
simone-silvestri 0c1f01c
another quick test
simone-silvestri bec1cd1
change environment
simone-silvestri 75546af
correct the utils
simone-silvestri 5f49ec0
Merge branch 'main' into ss/fix-gpu-tests
simone-silvestri 9b334af
this should load mpitrampoline
simone-silvestri f8c6401
Fix formatting
glwagner 1dc42bb
Go back to latest climacommon
glwagner 5a870e7
try adding Manifest
simone-silvestri 9e63f56
Manifest from julia 1.10
simone-silvestri 59548f8
we probably need to initialize on a GPU
simone-silvestri 642cfd9
these options should not create problems
simone-silvestri 4cee49a
let's see if this differs
simone-silvestri a46b25d
just version infos
simone-silvestri 4dffbe5
fiddling with O0
simone-silvestri 9c3c6cd
why are we using 8 threads?
simone-silvestri 3b28ecb
memory requirements are not this huge
simone-silvestri 7126c7c
speed up the precompilation a bit, to revert later
simone-silvestri 733ab2b
might this be the culprit?
simone-silvestri 2dbf1a0
revert to 8 tasks to precompile
simone-silvestri a4b129a
final version?
simone-silvestri 29f7d69
return to previous state of affairs
simone-silvestri b174313
reinclude enzyme
simone-silvestri 0283e6a
set cuda runtime version
simone-silvestri b4c1f2a
will this help in finding cuda?
simone-silvestri bc53a97
make sure we don't run OOM
simone-silvestri 811bfdb
bugfix in `set!`
simone-silvestri cd86a6a
try precompile inside runtests
simone-silvestri 4039299
revert back
simone-silvestri 2c6ad90
recompile everywhere
simone-silvestri 781992c
try nuclear option
simone-silvestri 08949b3
skip all these commands
simone-silvestri 908b31a
some failsafe option
simone-silvestri 466ec0c
increase a bit the memory
simone-silvestri a27b383
comment
simone-silvestri eec18c2
whoops unit tests are small
simone-silvestri 62c5834
Merge branch 'main' into ss/fix-gpu-tests
simone-silvestri 8011ef5
increase memory limits
simone-silvestri 0965067
Merge branch 'ss/fix-gpu-tests' of github.com:CliMA/Oceananigans.jl i…
simone-silvestri cd00381
tests were running on the CPU on sverdrup
simone-silvestri eebfc04
Merge branch 'main' into ss/fix-gpu-tests
simone-silvestri 8fc903e
Merge branch 'main' into ss/fix-gpu-tests
navidcy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,7 +50,7 @@ CubedSphere = "0.2, 0.3" | |
Dates = "1.9" | ||
Distances = "0.10" | ||
DocStringExtensions = "0.8, 0.9" | ||
Enzyme = "0.13.3" | ||
Enzyme = "0.13.14" | ||
FFTW = "1" | ||
Glob = "1.3" | ||
IncompleteLU = "0.2" | ||
|
@@ -77,10 +77,11 @@ julia = "1.9" | |
|
||
[extras] | ||
DataDeps = "124859b0-ceae-595e-8997-d05f6a7a8dfe" | ||
Enzyme = "7da242da-08ed-463a-9acd-ee780be4f1d9" | ||
SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f" | ||
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40" | ||
TimesDates = "bdfc003b-8df8-5c39-adcd-3a9087f5df4a" | ||
Enzyme = "7da242da-08ed-463a-9acd-ee780be4f1d9" | ||
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267" | ||
|
||
[targets] | ||
test = ["DataDeps", "Enzyme", "SafeTestsets", "Test", "TimesDates"] | ||
test = ["DataDeps", "SafeTestsets", "Test", "Enzyme", "MPIPreferences", "TimesDates"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was this the crucial part? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
120G is much more than we need for those tests. After some frustration, because tests were extremely slow to start, I noticed that the agents began much quicker by requesting a smaller memory amount. So I am deducing that the tests run on shared nodes instead of exclusive ones, and requesting lower resources allows us to squeeze in when the cluster is busy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good reason. might warrant a comment