Skip to content

bot sync meeting 2024 06 14

Kenneth Hoste edited this page Jun 14, 2024 · 1 revision

bot support for building for GPUs

  • Date: 2024-06-14 09:00-10:00 CEST
  • Participants: Thomas, Bob, Kenneth, Lara, Alan
  • Topics:
    • recap:
      • goal: make it possible to build for specific CUDA compute capability
      • status: put such builds under subdirectories of CPU-only software directories
    • changes for building only:
      • arguments to bot: build command
        • we loose some information on how (with which parameters) the stack is built if it is not specified in the easystack file
          • store information about build settings in some file in easybuild (or eessi?) subdirectory of software installation directory (where software is installed) and/or put information into the module file
        • we need to "verify" whether specified CUDA CC is "valid"
          • same goes for CPU targets
          • YAML file in software-layer that lists valid CPU targets + CPU/GPU combos
          • script run by bot/build.sh script that verifies specified CPU/GPU target
      • processing of arguments until they are handed over to actual build command eb --easystack ...
        • a bit unclear what the best way is to modify the build option; should not be fixed in an easystack file
          • do not use a default in the easystack file
          • check in build script if CUDA is in the dependencies and throw an error if no CUDA CC is specified (don't guess/use a default)
        • might need a hook that checks for an environment variable
          • when? parse_hook?
          • installation prefix will need to be tweaked for every installation performed by EasyBuild
            • CPU-only dependencies may be missed still, those should not be installed in /accel/...
          • EasyBuild hook should pick up on environment variable like $GPU_TARGET, and:
            • update installation prefix ($EASYBUILD_INSTALLPATH)
            • set $EASYBUILD_CUDA_COMPUTE_CAPABILITIES
      • letting eb use the additional argument (to request building for/with a CUDA compute capability and to install a package under a subdirectory)
        • belongs to a hook
    • changes to bot: build commands
      • currently
        bot: build architecture:x86_64/amd/zen3
        
      • suggestion: minimal change to add compute capability, one of the following options
        bot: build architecture:x86_64/amd/zen3+cc80
        
        or
        bot: build architecture:x86_64/amd/zen3 gpu:cc80
        
      • or accel:nvidia/cc80 (and accel:amd/gfx90a)
        • long version: accelerator
        • corresponds to accel subdirectory
          • implies that short version like a: (short for architecture:) won't work anymore
          • can just require that short version is at least 3 chars
    • changes in the processing of bot: build commands
      • initially only add support for "cross-compiling" GPU software on a CPU-only node
      • later also actually send GPU builds to GPU nodes
        • unclear how we'll make sure we can find a GPU node for each CPU+GPU combo
        • extra bot config option: accel_target_map, which maps GPU target to additional sbatch options
          • may not be enough, since that implies always sending GPU build to GPU nodes => we'll need a separate node:?
          • but maybe we want that anyway, so GPU builds always involve testing on GPUs
      • may be interesting to introduce another bot directive node:
      • we'll need to be careful with missing CPU-only dependencies of GPU builds
        • separate easystack file for GPU software is probably a good idea
        • don't enable --robot for GPU easystack files
        • check in EasyBuild startup hook that makes sure no CPU-only deps are missing when installing GPU software
    • changes to eb_hooks.py in the EESSI/software-layer GitHub repository
      • EESSI_SOFTWARE_SUBDIR_OVERRIDE=x86_64/amd/zen3 is the CPU based installation path part
      • EESSI_ACCEL_TARGET=nvidia/cc80 -> EASYBUILD_CUDA_COMPUTE_CAPABILITIES='8.0'
      • and/or bot/build.sh?
      • at which stage/step/hook should we add parameters (compute capability) and adjust the installation path
        • a hook that is triggered for each easyconfig
        • parse_hook is only triggered once (and before actual parsing)
          • but could still work, we already check for CUDA dependency there in our current parse_hook
          • would imply changing active EasyBuild configuration via update_build_option
          • OK as long as we make sure we won't do any CPU-only builds in that EasyBuild session
          • (Alan) "Thinking of the hook, you could set an ennvar when you encounter a CUDA build, and then in a parse hook if this variable is set but CUDA is not in the dependencies you bomb out"
        • pre_prepare_hook (or maybe pre_fetch_hook, that's earliest possible)
          • will have to change self.installdir
          • self.cfg['cuda_compute_capabilities'] = '8.0'
        • EASYBUILD_INSTALLPATH=/apps/zen3
          • /apps/zen3/modules/all => $MODULEPATH
          • self.installpath = '/apps/zen3/accel/nvidia/cc80'
          • => /apps/zen3/accel/nvidia/cc80/modules/all to $MODULEPATH
    • (not needed for building) changes in the detection of CPU/GPU architecture
    • what is needed to try this via dev.eessi.io ?
      • eessi_container.sh needs to support multiple repositories (and overlays maybe only one overlay for dev.eessi.io)
        • --acccess=rw:REPO_ID
        • --repository=REPO_ID:access=rw
        • should revisit source .../eessi_defaults why is it needed?
        • need to make multiple CVMFS repositories available with specific access mode (rw or ro) per repository
      • needs some settings for what to use as base (software.eessi.io + dev.eessi.io) and where to install to (only dev.eessi.io)
      • could it leverage EESSI-extend/2023.06-easybuild, e.g., specifying $EESSI_PROJECT_INSTALL?
  • Links:
Clone this wiki locally