Skip to content

meeting Apr 1 2021

Kenneth Hoste edited this page Apr 1, 2021 · 6 revisions

Notes for 20210401 meeting

  • date & time: Thu Apr 1st 2021 - 2pm CEST (12:00 UTC)
    • (every first Thursday of the month)
  • venue: (online, see mail for meeting link, or ask in Slack)
  • agenda:
    • Quick introduction by new people
    • EESSI-related meetings in last month
    • Application for CZI grant “Essential Open Source Software for Science”
    • Progress update per EESSI layer
    • Pilot repository: changes & status
    • Usage of AWS resources
    • Discussion with NVIDIA w.r.t. CUDA
    • S4 NeIC project proposal + NESSI test lab
    • Next steps
    • Past & upcoming events
    • Q&A

Slides

Meeting notes

(by Bob, Kenneth)

Introduction by new people

  • New people on the call:
    • Jure Pečar from EMBL
      • experienced EasyBuild user, helping out with BLIS evaluation

CernVM-FS coordination meeting

  • ephemeral publish container is in Program of Work
  • discussed inclusion of EESSI in default cvmfs-config repo

Application for CZI grant

  • Proposal written/submitted by Alan in collaboration with the UMCG (University Medical Center Groningen), focussing on rare diseases and supporting biomedical workflows with EESSI
  • question by Victor: how will this impact EESSI in general?
    • Alan/Kenneth: general goal should not be impacted, it's mostly focused on a particular use case, which probably means we should provide a bunch more bioinformatics software in our repo to support typical workflows used in rare disease studies

Progress update: filesystem layer

  • planning to create new master key for Stratum-0
    • only store on Yubikeys (+ physical backup like USB stick in safe storage)
    • then get EESSI configuration into cvmfs-config-default repository, so any client can get easy access to EESSI
  • Caspar: this would be good time to document native installation of CernVM-FS for EESSI

Stratum 1 in AWS

  • by Jörg with help from Bob & Terje
  • up and running in AWS instance (tx.large) in eu.west region (?)
  • not included yet in latest EESSI configuration (0.3.0), but will be soon
  • "it was fairly easy"
  • currently using XFS, which was not recommended in older CVMFS versions -> to be checked with CVMFS developers
  • question by Jure if he should set up a Stratum 1 at EMBL
    • makes sense, eventually every "big" HPC site would want their own Stratum 1 anyway (to have a full copy of the repo, to protect themselves from network issues)
    • how many Stratum 1 servers do we want and how should they be distributed?
      • CVMFS devs warned us that we shouldn't have too many either...
      • but not all Stratum 1 servers need to be included in EESSI configuration
    • still looking for volunteers to go through the process to set up an additional Stratum 1 (in different AWS region)

Progress update: compatibility layer

  • we should reach out to the Gentoo developers to check if/how we can help to get the problems that we run into (Lmod, bootstrap on POWER) resolved
    • can we help with testing stuff, showing that it works, etc.?
  • check if we can use ReFrame instead of pytest for the compatibility layer validation
    • ask Victor if these compat layer tests can be run through ReFrame?
  • the installation of 2021.02 was broken and has been removed from the repository
  • ppc64le installation on hold for 2021.03, hoping that the upstream fix for the bootstrap script will get done soon

Progress update: software layer

  • some experiments with speeding up the install script by running multiple EasyBuild sessions in parallel, to overlap installations with different dependencies
    • not fully working yet
    • better approach would be to farm out installations to a Slurm cluster (via CitC in AWS)

2021.03 pilot repo

  • ppc64le on hold, until the bootstrap issue has been fixed upstream
  • More or less the same software and hardware targets
  • GPU installations on hold until we get green light from NVIDIA to include CUDA in the repository

Experience report on building software layer for AMD Rome

  • by Jörg
  • the build script worked very well: the colors are very useful, the comments are nice, and it does some nice checks (e.g. if you are running it in a Prefix environment)
  • we need to write some documentation for doing the software installations
    • Jörg volunteers to do this by making a PR in the docs repository

Discussion with NVIDIA w.r.t. CUDA

  • question by Jörg: would it help to invite more vendors?
    • there's already quite a few very big companies joining the call, and we shouldn't make the group too large.

S4 NeIC proposal + NESSI

  • ...

NESSI test lab

  • big deployment of EESSI on several systems with different hardware, including GPUs
  • limited access to a small group of users using permissions on /cvmfs

Sponsorship AWS/Azure

  • AWS credits are being spent on several things: Stratum 1, build nodes, test machines, etc
  • Discussions with Azure are ongoing

AWS infrastructure

  • We need Packer-built images for Openstack as well
  • Terje shows a demo of the infrastructure code / scripts for deploying dynamic infrastructure
    • create/remove nodes
    • grant access to users based on GitHub handles
    • support for different node types
    • available on the AWS login node

Q&A

  • interest in setting up a separate meeting to discuss how software stack on new clusters can be set up with a later transition to EESSI in mind?
    • Bob will set up a doodle for this
Clone this wiki locally