-
Notifications
You must be signed in to change notification settings - Fork 0
bot sync meeting 2024 06 14
Kenneth Hoste edited this page Jun 14, 2024
·
1 revision
- Date: 2024-06-14 09:00-10:00 CEST
- Participants: Thomas, Bob, Kenneth, Lara, Alan
- Topics:
- recap:
- goal: make it possible to build for specific CUDA compute capability
- status: put such builds under subdirectories of CPU-only software directories
- bot is agnostic to this (
bot/build.sh
script will take CUDA compute capability as input from bot somehow) - https://gitlab.com/eessi/support/-/issues/59
- bot is agnostic to this (
- changes for building only:
- arguments to
bot: build
command- we loose some information on how (with which parameters) the stack is built if it is not specified in the easystack file
- store information about build settings in some file in
easybuild
(oreessi
?) subdirectory of software installation directory (where software is installed) and/or put information into the module file
- store information about build settings in some file in
- we need to "verify" whether specified CUDA CC is "valid"
- same goes for CPU targets
- YAML file in
software-layer
that lists valid CPU targets + CPU/GPU combos - script run by
bot/build.sh
script that verifies specified CPU/GPU target
- we loose some information on how (with which parameters) the stack is built if it is not specified in the easystack file
- processing of arguments until they are handed over to actual build command
eb --easystack ...
- a bit unclear what the best way is to modify the build option; should not be fixed in an easystack file
- do not use a default in the easystack file
- check in build script if CUDA is in the dependencies and throw an error if no CUDA CC is specified (don't guess/use a default)
- might need a hook that checks for an environment variable
- when? parse_hook?
- installation prefix will need to be tweaked for every installation performed by EasyBuild
- CPU-only dependencies may be missed still, those should not be installed in
/accel/...
- CPU-only dependencies may be missed still, those should not be installed in
- EasyBuild hook should pick up on environment variable like
$GPU_TARGET
, and:- update installation prefix (
$EASYBUILD_INSTALLPATH
) - set
$EASYBUILD_CUDA_COMPUTE_CAPABILITIES
- update installation prefix (
- a bit unclear what the best way is to modify the build option; should not be fixed in an easystack file
- letting
eb
use the additional argument (to request building for/with a CUDA compute capability and to install a package under a subdirectory)- belongs to a hook
- arguments to
- changes to
bot: build
commands- currently
bot: build architecture:x86_64/amd/zen3
- suggestion: minimal change to add compute capability, one of the following options
or
bot: build architecture:x86_64/amd/zen3+cc80
bot: build architecture:x86_64/amd/zen3 gpu:cc80
- or
accel:nvidia/cc80
(andaccel:amd/gfx90a
)- long version:
accelerator
- corresponds to
accel
subdirectory- implies that short version like
a:
(short forarchitecture:
) won't work anymore - can just require that short version is at least 3 chars
- implies that short version like
- long version:
- currently
- changes in the processing of
bot: build
commands- initially only add support for "cross-compiling" GPU software on a CPU-only node
- later also actually send GPU builds to GPU nodes
- unclear how we'll make sure we can find a GPU node for each CPU+GPU combo
- extra bot config option:
accel_target_map
, which maps GPU target to additionalsbatch
options- may not be enough, since that implies always sending GPU build to GPU nodes => we'll need a separate
node:
? - but maybe we want that anyway, so GPU builds always involve testing on GPUs
- may not be enough, since that implies always sending GPU build to GPU nodes => we'll need a separate
- may be interesting to introduce another bot directive
node:
- we'll need to be careful with missing CPU-only dependencies of GPU builds
- separate easystack file for GPU software is probably a good idea
- don't enable
--robot
for GPU easystack files - check in EasyBuild startup hook that makes sure no CPU-only deps are missing when installing GPU software
- changes to
eb_hooks.py
in the EESSI/software-layer GitHub repository-
EESSI_SOFTWARE_SUBDIR_OVERRIDE=x86_64/amd/zen3
is the CPU based installation path part -
EESSI_ACCEL_TARGET=nvidia/cc80
->EASYBUILD_CUDA_COMPUTE_CAPABILITIES='8.0'
- and/or
bot/build.sh
? - at which stage/step/hook should we add parameters (compute capability) and adjust the installation path
- a hook that is triggered for each easyconfig
-
parse_hook
is only triggered once (and before actual parsing)- but could still work, we already check for CUDA dependency there in our current
parse_hook
- would imply changing active EasyBuild configuration via
update_build_option
- OK as long as we make sure we won't do any CPU-only builds in that EasyBuild session
- (Alan) "Thinking of the hook, you could set an ennvar when you encounter a CUDA build, and then in a parse hook if this variable is set but CUDA is not in the dependencies you bomb out"
- but could still work, we already check for CUDA dependency there in our current
-
pre_prepare_hook
(or maybepre_fetch_hook
, that's earliest possible)- will have to change
self.installdir
self.cfg['cuda_compute_capabilities'] = '8.0'
- will have to change
-
EASYBUILD_INSTALLPATH=/apps/zen3
-
/apps/zen3/modules/all
=>$MODULEPATH
self.installpath = '/apps/zen3/accel/nvidia/cc80'
- =>
/apps/zen3/accel/nvidia/cc80/modules/all
to$MODULEPATH
-
-
- (not needed for building) changes in the detection of CPU/GPU architecture
- NVIDIA GPU detection in
archdetect
? - see notes in https://gitlab.com/eessi/support/-/issues/59
- NVIDIA GPU detection in
- what is needed to try this via
dev.eessi.io
?-
eessi_container.sh
needs to support multiple repositories (and overlays maybe only one overlay for dev.eessi.io)--acccess=rw:REPO_ID
--repository=REPO_ID:access=rw
- should revisit
source .../eessi_defaults
why is it needed? - need to make multiple CVMFS repositories available with specific access mode (
rw
orro
) per repository
- needs some settings for what to use as base (
software.eessi.io
+dev.eessi.io
) and where to install to (onlydev.eessi.io
) - could it leverage
EESSI-extend/2023.06-easybuild
, e.g., specifying$EESSI_PROJECT_INSTALL
?
-
- recap:
- Links: