-
Notifications
You must be signed in to change notification settings - Fork 0
meeting 2024 06 06
Bob Dröge edited this page Jun 6, 2024
·
3 revisions
- date & time: Thu 6 June 2024 - 14:00 CEST (13:00 UTC)
- (every first Thursday of the month)
- venue: (online, see mail for meeting link, or ask in Slack)
- agenda:
- Quick introduction by new people
- EESSI-related meetings and events in last month
- Progress update per EESSI layer
- Update on build-and-deploy bot
- Update on EESSI production repository software.eessi.io
- Update on EESSI documentation
- Update on EESSI test suite
- Additional EESSI repositories: dev.eessi.io, riscv.eessi.io
- EESSI on macOS
- AWS/Azure sponsorship update
- Upcoming/recent events: ISC’24
- Q&A
(by Bob/Kenneth)
(see slides)
(see slides)
- Ansible playbook for Stratum-1 now uses our fork of ansible-cvmfs role
- Bob is planning to kickstart discussion on setting up proper monitoring for our Stratum 1 servers (disk usage, network bandwidth usage, load, etc.)
(see slides)
- starting to consider new version of compat layer, see https://gitlab.com/eessi/support/-/issues/56
- combo of reasons:
- new glibc, OpenSSL in compat layer
- adopting EasyBuild 5.0
-
2024a
common toolchain in EasyBuild
- combo of reasons:
(see slides)
- We still need to document the EESSI-extend module, but it works really well
- With just a few simple commands you can start building your own modules on top of EESSI
- Also see Åke's presentation at the EasyBuild user meeting: https://users.ugent.be/~kehoste/eum24/004_eum24_hpc2n.pdf
- Both Kenneth and Julian are working on Extrae, but tests are failing in the
make check
step - Thomas is working on PyTorch-bundle, but it's failing due to librosa not being able to find a library, as Python's ctypes library doesn't return full paths to libraries. Related issues:
- The
--from-commit
feature doesn't fully work yet (it has trouble finding dependencies), is currently being fixed in EasyBuild - installing GPU software
- (Caspar) some software will require a newer CUDA CC than 6.0, which would be a problem for the fallback installations in
generic/accel
- maybe drop
cc60
part fromgeneric/accel/nvidia/cc60
? - also consider fat builds under
generic/accel
?
- maybe drop
- (Kurt) PyTorch may require newer CUDA CC or GPU drivers
- (Caspar) some software will require a newer CUDA CC than 6.0, which would be a problem for the fallback installations in
(see slides)
- Automatic cleanup moves the job directories of merged PRs to some sort of trash bin, which can be purged later
- v0.5.0 of bot not used yet on EESSI build cluster
- waiting for merge on PRs that update bot configuration
(see slides)
(see slides)
- Lots of work was done on the documentation, also because of the hackathon which had a strong focus on merging documentation PRs
- The documentation now includes an automatically generated page with the available software: https://www.eessi.io/docs/available_software/overview/
- Sites can easily use the same tooling to make similar overviews for their stacks
- And we've also added a blog: https://www.eessi.io/docs/blog/
- We should mention the mailing list on the website
(see slides)
(see slides)
- Bob is looking into building PyTorch for RISC-V
- problems with dependencies seem to be solved, now looking into PyTorch itself
- no more workarounds will be required for installing
foss/2024a
for RISC-V - Julian can help people get access to RISC-V Slurm cluster at BSC (which has CernVM-FS installed and EESSI readily available)
(see slides)
- Lara is working on documentation
- We should also document how to use EESSI on Windows with WSL2
- Pedro volunteered to do this, as he's using WSL a lot
- Kurt has some documentation on https://klust.github.io/windows-client-HPC/4_Cluster_Stack/4_01_EESSI/#example-setup-on-opensuse-in-wsl2
(see slides)
- (Alan) we should look into splitting up they way sponsored credits are consumed in both AWS & Azure
- to avoid that people who get access to things they don't need, and make
(see slides)
- Next meeting: July 4