Skip to content

Sync meeting on EESSI software layer (2023 06 23)

Kenneth Hoste edited this page Jun 23, 2023 · 1 revision

EESSI bot tutorial meeting (2023-06-23)

Attending: Kenneth, Bob, Thomas, Alan, Caspar, Satish, Lara, Xin, Maxim

Goals

  • explain how to build-and-deploy bot works
  • answer questions
  • hands-on: opening your first PR to add software to EESSI pilot 2023.06

Documentation


Caveats

EasyBuild v4.7.2

Stick to easyconfigs included with EasyBuild v4.7.2 for now.

Build in small increments

Don't add a new easystack file for OpenFOAM + all dependencies - start slowly & build your way up.

Rule of thumb: build times should be handful of hours at maximum, not days.

Required open EasyBuild PRs to fix build problems

For some installations, more will be needed than just adding a line to an easystack file.

Most can be found in the software install script used for 2021.12.

Where possible, we should avoid using --include-easyblocks-from-pr and --from-pr, and resort to implementing an EasyBuild hook instead.

GCC 9.3

Build broken due to recent glibc and/or kernel headers, will need extra patches (similar to what was done in easyconfigs PR #14453).

CMake

requires easyblocks PR #2248, so:

easyconfigs:
  ...
  - CMake-3.21.1-GCCcore-11.2.0.eb:
      options:
        include-easyblocks-from-pr: 2248
Java

Requires easyblocks PR #2557

OpenBLAS
  • easyblock PR #1946 no longer needed for building OpenBLAS for */generic target, replaced by hook (see PR #260);
  • max_failing_lapack_tests_num_errors of 150 seems a bit too tight on aarch64/*, should be relaxed a bit via an EasyBuild hook for OpenBLAS?
TensorFlow

Requires easyblock PR #2218


Hands-on

Simple software installations to use during hands-on

  • [assigned to Lara] CMake-*-GCCcore-11.2.0.eb
  • [assigned to ???] bzip2-1.0.8-GCCcore-10.3.0.eb
  • [assigned to ???] time-1.9-GCCcore-10.3.0.eb
  • [assigned to ???] pigz-2.6-GCCcore-10.3.0.eb
  • [assigned to Caspar] libpng-1.6.37-GCCcore-10.3.0.eb
  • [assigned to Satish] libjpeg-turbo/2.0.6-GCCcore-10.3.0

New easystack files to kickstart:

  • [assigned to ???] eessi-2023.06-eb-4.7.2-2022a.yml
    • start with GCC/11.3.0
  • [assigned to ???] eessi-2023.06-eb-4.7.2-2022b.yml
    • start with GCC/12.2.0

Notes

  • what if there's a syntax error introduced in an easystack files
    • => let bot report error in a comment?
  • developer docs to show hierarchy of scripts
    • can be done in README of software-layer
    • bot/build.sh as starting point
  • only build easystack file that was touched
    • to help avoid hitting GitHub rate limits due to --from-pr
  • handling duplicate tarballs
    • bot can be configured to only upload latest tarball per CPU target
    • duplicate ingest requests can be rejected in EESSI/staging repo
  • Lmod update should be moved to Stratum-0 on ingest
    • can be integration in ingest procedure, since it only takes seconds
    • and removed from install script in software-layer repo
  • easystack files could be in a separate repo (EESSI/software-layer-easystacks)
    • to help prevent malicious actors tweaking bot/build.sh & co
    • can be handled in bot/build.sh of repository that only has easystack files?
      • script can pull in current 2023.06 version of software-layer repo to obtain scripts to run
  • contribution policy
    • don't deploy your own builds (can/should be enforced by the bot)
    • don't merge your own PRs (already enforced by using protected branch and requiring approved PR review)
  • do we really still need the staging repo?
    • yes, for now, useful for being able skip certain ingestions
    • also useful for auditing (what happened when, who triggered it, etc.)
  • no way to instruct bot to cancel running build jobs
    • Satish will open an issue for this
  • no way yet to deal with software that doesn't work across all CPU targets
    • separate easystack file for x86_64?
    • way to label easystack entries and pre-process it before it's fed to EasyBuild?
  • need feedback from bot on deploy
    • currently silently ignores require to deploy if contributor has no permissions
    • no comment to indicate that request to deploy was received
  • allow SSH into workernode for running jobs on AWS Slurm cluster
    • useful for inspecting running build jobs
    • currently not allowed, can probably be changed
  • document ready-to-deploy label
    • can be used by contributors who don't have deploy permissions
  • bot should re-trigger CI + merge PR after deploy
  • standard for PR titles to add software: {2023.06}[<tc_gen>] <name> v<version>
  • bot command + build permission granted to Thomas, Bob, Kenneth, Caspar, Alan, Lara, Maxim, Satish, Xin, Richard
  • deploy permission granted to Thomas, Bob, Kenneth, Caspar, Alan
  • follow-up meeting to be planned 1st week of July (mayb Tue 4 July'23)...
Clone this wiki locally