Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] oneAPI tests are hanging #815

Closed
giordano opened this issue Feb 2, 2024 · 1 comment · Fixed by #818
Closed

[CI] oneAPI tests are hanging #815

giordano opened this issue Feb 2, 2024 · 1 comment · Fixed by #818

Comments

@giordano
Copy link
Member

giordano commented Feb 2, 2024

CI jobs using Intel oneAPI MPI 2021.7.0 are hanging forever after the test_shared_win.jl test set, causing jobs to eventually time out

To summarise

  • between the two different runs the last julia versions changed, but on 2024-02-02 I ran again jobs with julia v1.9.3 in https://github.com/JuliaParallel/MPI.jl/actions/runs/7759323957 and the tests are hanging again, so the julia version doesn't seem to be relevant
  • there haven't been significant changes of source code in this package between the two runs, so I would rule out some problems in this package
  • more interestingly, I can't reproduce the issue locally in the containers we're using for CI (tried both ghcr.io/juliaparallel/github-actions-buildcache:intel-oneapi-mpi-jq or ghcr.io/juliaparallel/github-actions-buildcache:intel-oneapi-mpi-2021.7.0-gzc7es2p27ftwyk4sdplynlj6d54xzi6.spack): I installed both julia v1.9.4 and v1.9.3, I couldn't reproduce the hangs with either of them, tests just run fine every single time.

I suspect there have been some changes in GitHub-hosted runners configuration which is causing this, but this hypothesis is hard to test. One last option to try is to use a newer version of Intel oneAPI MPI, in case this was some sort of old bug later fixed, but I'll postpone that test for another day. In the meantime, I'm opening this ticket to keep track of the issue. Edit: it does appear that just upgrading to oneAPI 2021.11.0 magically fixes the hang without any other change on our side, this is implemented in #818.

@giordano
Copy link
Member Author

giordano commented Feb 5, 2024

With #818 merged we're back to green CI on master after three months: https://github.com/JuliaParallel/MPI.jl/actions/workflows/UnitTests.yml?query=branch%3Amaster 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant