-
Notifications
You must be signed in to change notification settings - Fork 0
meeting 2024 09 05
Bob Dröge edited this page Sep 5, 2024
·
5 revisions
- date & time: Thu 9 Sept 2024 - 14:00 CEST (13:00 UTC)
- (every first Thursday of the month)
- venue: (online, see mail for meeting link, or ask in Slack)
- agenda:
- Quick introduction by new people
- EESSI-related meetings and events in last month(s)
- Progress update per EESSI layer
- Update on EESSI production repository software.eessi.io
- Modulefile for initializing the EESSI stack
- Update on EESSI documentation + test suite + build-and-deploy bot
- EESSI as backend in Ramble
- Status page and monitoring
- AWS/Azure sponsorship update
- Upcoming/recent events
- Frequency of EESSI update meeting
- Q&A
(by Pedro/Bob)
- James Simone (Fermilab)
- Leonardo Honfi Camilo (Wageningen University)
- Both are EB users and EUM attendees. Welcome!
(see slides)
- Thomas will present an EESSI update at upcoming CernVM workshop.
- Hackathons (usually) every third week to focus on advancing specific topics
(see slides)
- Updated CernVM-FS to new minor release on all servers
- WIP: Grafana dashboard and alerting of the infrastcture (details later)
(see slides)
- Upcoming compat layer to include OpenSSL 3 and EasyBuild 5.0
(see slides)
- Lot of software package have been added in the last two months
- For zen4 we have to skip one toolchain that is not supported
(see slides)
- Focus on catchinig up to existing previously supported CPU targets. Contributions are very welcome, and we are able to help submissions that possibly run into issues
(see slides)
- Module file is able to play nicely with existing software stacks on a site, when compared to the existing init bash script. There are (possible) quirks to this, e.g., sticky module. If a local Lmod stack is already present, then the modules will be "mixed" with the EESSI ones. This shouldn't be a problem in most cases, however. Please try it out, feedback is very welcome!
- Using a shell other than bash is likely not a problem.
(see slides)
- Possible next stop for bot clean-up is to have a cron job that deletes these directories every month
- Jobs can have a unique name, which is necessary to run several bot instances by the same account.
- First step for accelerator build support. E.g., bot: build accelerator:nvidia/Y
- Community contribution to costumize bot build jobs (more time, RAM, etc)
(see slides)
- Documentation/tutorial for developers who already know how to write ReFrame test (which we already document), but now in a portable way. Useful for test suite contributors.
- Blog post by Julián on installing Extrae to
riscv.eessi.io
which was also expanded to includesoftware.eessi.io
. Interesting as a example and overview of a complex installation, including in emerging architectures. - Improvements to the available software page. Application pages include short description and loadin instructions.
zen4
is now included in the software availability pages - Starting point for list of sites that already include EESSI. Maybe suggest in the set-up documentation that sysadmins include themselves.
- CI/CD component in EESSI (GitHub Actions and GitLab Components)
(see slides)
- mpi4py tutorial ReFrame test for writing the tests and making it portable
- Improvements on handling duplicate modules in the module path (now users are warned)
- New hook to handle per node memory usage, queried from the master node in the allocation.
- Open MetalWalls PR from a contributor, under review.
- Open WIP PR to handle 'mandatory' hooks better. Hooks would run by default because they would be inherited.
- Progress on dahsboard (WIP) and not publicly available yet. Test reports go to an ElasticSearch database. Useful to see consitency of performance day to day. Some data may not be made public.
(see slides)
- Google cloud Ramble tool for benchmarking using existing tools. It now supports EESSI, and this is documented.
- It is used in another project called BenchPark https://github.com/LLNL/benchpark
(see slides)
- Revamped status page code (now in Rust). Now now exports json files so the information can be easily scraped and accessed. Overall status depends on individual components (stratum-1s, etc).
- Also exports statuses as Prometheus metrics to be picked up downstream.
- WIP Grafana, Prometheus, alerts, exporters for specific information (versions, sync status of stratum-1 with stratum-0 etc). Monitoring server work underway, exporters being added to the cvmfs servers. Thresholds and tweaks under way, as is alert channel in Slack. Installation work of exporters is almost done.
(see slides)
- No major news, likely refresh of credits to happen soon. Azure usage ramping up due to zen4 builds.
(see slides)
- Thomas presents at CernVM-FS workshop 16-18 September
- EuroHPC User Day 2024 - EESSI presentation (paper, to be submitted tomorrow) and also presence at CoE "event".
- SC'24 Birds-of-a-Feather session accepted! HPCNow! will go.
- Proposal to have update meeting every two months. Approved. Next meeting: November 7th
- Suggestion to have a regular EESSI elevator pitch and demonstration of topics