Skip to content

meeting 2023 07 06

Bob Dröge edited this page Jul 6, 2023 · 2 revisions

Notes for 2023-07-06 meeting

  • date & time: Thu 7 July 2023 - 14:00 CEST (12:00 UTC)
    • (every first Thursday of the month)
  • venue: (online, see mail for meeting link, or ask in Slack)
  • agenda:
    • Quick introduction by new people
    • EESSI-related meetings in last month
    • Progress update per EESSI layer (incl. build-and-deploy bot + test suite)
    • EESSI pilot repository
    • AWS/Azure sponsorship update
    • Update on MultiXscale EuroHPC project
    • Upcoming events
    • Q&A

Slides

Meeting notes

(by Caspar)

Quick introduction by new people

  • No new people

EESSI-related meetings in last month

(see slides)

  • Discussed EESSI support portal, what software to use. Moving in the direction of Gitlab

Progress update per EESSI layer

Filesystem layer

(see slides)

  • Init tarballs: tarballs that contain our bash init script. Was not correctly recognized by the automation on the Stratum 0, so they were not properly ingested. Fixed now.

  • Github action for building client packages is broken. Not fixed yet. Especially problematic if we need to release a new EESSI configuration.

  • New stratum 0 is installed, OS is there. Todo: RAID setup, network settings, set up yubikeys.

    • Some discussion about how and who to give access when Bob is away.
    • Yubikeys are used for resigning keys. Bob's plan for now was just to stick the yubikey in the machine in the DC, as it is physically safe. But up for discussion.
  • Bandwidth issues with RUG

    • Bob plans to move the Stratum 1 to diffent VLAN to circumvent the central RUG firewall, since we suspect that might cause slow performance. We might do the same with the Stratum 0.
  • Idea to use CDN for stratum 1. Question if this works under the SURF contract through Azure, but its a question if the outbound traffic is paid off. It is, but up to a certain degree, so we can't be too excessive. If we really plan for heavy outbound traffic, we should discuss it with the SURF Research Cloud group.

Compatibility layer

(see slides)

  • Issues with OpenSSL 3 in the compat layer (2023.04). Tried to circumvent it, but proved too difficult
    • Build new compat layer with OpenSSL 1.1.1 (2023.06)
    • Installed init scripts for 2023.06, so you can start using it
Software layer

(see slides)

  • All software for 2023.06 is build and deployed by the bot.
  • We're using EB 4.7.2 and EasyStack files to do this. One EasyStack file per toolchain. If you want to add software, you add it to the EasyStack file of the toolchain you want to use. Then, open a PR with that change. Procedure is now documented
    • For examples, see some of the previous software layer PRs
Build-and-deploy bot

(see slides)

  • Most work done by Kenneth and Thomas, but they are not in the meeting
    • Added support for the bot to listen to commands in comments, instead of label. It can e.g. now build for one specific architecture.
  • We have a script for each repo for which we want the bot to build (bot/check-build.sh). Each target repo implements its own, so it can be repo-specific. Then you can easily reuse the bot for e.g. local sites, and only need to implement the bot/check-build.sh.
  • Satish: two things that are still missing
    • Stop the bot doing an existing build
    • Tell the bot to rebuild something that is already deployed (e.g. because we found some issues after deployment)
EESSI test suite
  • See slide
EESSI pilot repository

(see slides)

  • 2021.12 is 'frozen', but still the default. We want to change that when all the software that was in the old version is also in the new one
  • 2023.06: aarch64 and x86_64 compat layers in place
    • Some renaming aarch64 microarchitectures (neoverse_n1, neoverse_v1)
  • TODO: fix lmod cache update. Challenge is what happens if multiple builds are happening in parallel. Potentially do this at ingest time at the stratum 0 (i.e. for every transaction), or run it on a cronjob.
EESSI Contribution Policy

(see slides)

  • There is a PR to docs about this. Key points
    • Open source
    • Build by the bot
    • Supported by latest EasyBuild (for now), use --from-pr and --include-easyblocks-from-pr
    • Compiler toolchains: only if EB supports it
    • Ideally for all CPU targets (exceptions allowed in case of technical issues)
    • Recent toolchains preferred
    • Should be a way of testing installations. Ideally EESSI test suite, but probably test deverlopment is too difficult too make this a hard requirement

Alan: we probably don't want to keep reinstalling old toolchains on newer compat layers, it is likely to give issues. Its essentially trying to install old toolchains on new OS, which we know can cause issues. It also just gives us a lot of extra work.

Satish: what if someone requests a software, e.g. OpenFOAM, which toolchain should we then use to build it? Bob: probably be pragmatic and check what is in EB. Satish: so it's ok to support multiple versions of the same software in the same toolchain? => Yes, so e.g. multiple versions of OpenFOAM for the same toolchain would be acceptable, as long as they are in the latest EB release as EasyConfigs.

EESSI Support Portal
  • Landed on Gitlab as the final approach
  • Did testing on client side as well as support side
  • We are experimenting with a self hosted one, in case the free version becomes too limiting
  • TODO set up support portal at https://gitlab.com/eessi/support
  • TODO set up support rotation

AWS/Azure sponsored credits

(see slides)

  • No special remarks, see slides
  • Stratum 1 in Azure
  • Virtual SLURM cluster in Azure with CycleCloud, but may want to switch back to Magic Castle
  • Both AWS/Azure still have credits left for the coming months

MultiXscale EU project

(see slides)

  • Access to various EuroHPC JU systems, since we are supposed to do CI/CD there. This helps us to have a discussion with these sites to make EESSI available on their systems.
  • Support portal is a deliverable for MultiXscale, hence the big push on that.

Events

Q&A

  • Mail from Valentin Volkl about speaker for an EESSI talk for the High-Energy Physics Software foundation. Date is up for discussion with whomever wants to present.
Clone this wiki locally