These instructions are for installing torchBragg
and the Computational Crystallography Toolbox cctbx
on the Perlmutter supercomputer at NERSC.
Start by opening a terminal on a login node on Perlmutter and run the following:
export USERNAME={your_nersc_username}
export PROJECT_ID={mXXX}
module load PrgEnv-gnu cpe-cuda cudatoolkit
mkdir -p /global/cfs/cdirs/${PROJECT_ID}/users/${USERNAME}/cctbx_install
git clone https://github.com/JBlaschke/alcc-recipes.git alcc-recipes-torchBragg
cd alcc-recipes-torchBragg/cctbx/
Open setup_perlmutter.sh
:
vi setup_perlmutter.sh
Apply the following patch to dissociate the existing version of the libssh
library:
a/cctbx/setup_perlmutter.sh b/cctbx/setup_perlmutter.sh
index 3600cf8..56b8d44 100755
--- a/cctbx/setup_perlmutter.sh
+++ b/cctbx/setup_perlmutter.sh@@ -31,6 +31,7 @@ if fix-sysversions
then
return 1
fi
+rm ./opt/mamba/envs/psana_env/lib/libssh.so.4
mk-cctbx cuda build hot update
patch-dispatcher nersc
The following takes about an hour to complete, recommended to run on a NoMachine instance:
./setup_perlmutter.sh > >(tee -a ../../alcc-recipes-torchBragg.log) 2> >(tee -a ../../alcc-recipes-torchBragg.err >&2)
Create a file in $HOME
to source the environment:
cd
vi env_torchBragg
Copy in the following:
export USERNAME={your_nersc_username}
export PROJECT_ID={mXXX}
export CFSW=$CFS/${PROJECT_ID}/users/${USERNAME}/cctbx_install
export WORK=$CFSW/evaluate
cd $WORK
source $CFSW/alcc-recipes-torchBragg/cctbx/utilities.sh
source $CFSW/alcc-recipes-torchBragg/cctbx/opt/site/nersc_perlmutter.sh
module load evp-patch
load-sysenv
activate
export MODULES=$CFSW/alcc-recipes-torchBragg/cctbx/modules
export BUILD=$CFSW/alcc-recipes-torchBragg/cctbx/build
export OMP_PLACES=threads
export OMP_PROC_BIND=spread
export KOKKOS_DEVICES="OpenMP;Cuda"
export KOKKOS_ARCH="Ampere80"
export CUDA_LAUNCH_BLOCKING=1
export SIT_DATA=${OVERWRITE_SIT_DATA:-$NERSC_SIT_DATA}:$SIT_DATA
export SIT_PSDM_DATA=${OVERWRITE_SIT_PSDM_DATA:-$NERSC_SIT_PSDM_DATA}
export MPI4PY_RC_RECV_MPROBE='False'
export SIT_ROOT=/reg/g/psdm
Source your new file:
source ~/env_torchBragg
cd $MODULES
Clone the following repos, including torchBragg
:
git clone https://github.com/nksauter/LS49.git
git clone https://gitlab.com/cctbx/ls49_big_data.git
git clone https://gitlab.com/cctbx/uc_metrics.git
git clone https://github.com/lanl/lunus.git
git clone https://github.com/dermen/sim_erice.git
git clone https://gitlab.com/cctbx/psii_spread.git
git clone https://gitlab.com/cctbx/xfel_regression.git
git clone https://github.com/ExaFEL/exafel_project.git
git clone https://github.com/dermen/cxid9114.git
git clone https://github.com/gigantocypris/torchBragg.git
Apply the following patch to diffBragg
:
--- a/simtbx/diffBragg/src/diffBragg.cpp
+++ b/simtbx/diffBragg/src/diffBragg.cpp
@@ -1847,6 +1847,7 @@ void diffBragg::add_diffBragg_spots(const af::shared<size_t>& panels_fasts_slows
Npix_to_model = panels_fasts_slows.size()/3;
SCITBX_ASSERT(Npix_to_model <= Npix_total);
+ raw_pixels_roi = af::flex_double(Npix_to_model); // NKS, only way to correctly size & zero array
double * floatimage_roi = raw_pixels_roi.begin();
diffBragg_rot_mats();
Run the following:
libtbx.configure LS49 ls49_big_data uc_metrics lunus sim_erice xfel_regression
cd $BUILD
mk-cctbx cuda build # not make!!
cd $MODULES
libtbx.configure LS49 ls49_big_data uc_metrics lunus sim_erice xfel_regression
libtbx.refresh
libtbx.python -m pip install natsort
mkdir -d $WORK/output_torchBragg
Test your PyTorch install:
libtbx.python
import torch
torch.cuda.is_available()
As of June 2024 the following branches are needed:
cd $MODULES/cctbx_project
git pull
git checkout memory_policy
cd $MODULES/dials # this only matters for indexing
git pull
git checkout dsp_oldstriping_mcd_stills
These packages may need to be pulled as well to update an old install:
cd $MODULES/exafel_project
git pull
cd $MODULES/LS49
git pull
cd $MODULES/ls49_big_data
git pull
cd $MODULES/psii_spread
git pull
git checkout kramkron #
cd $MODULES/uc_metrics
git pull
Anytime C++ code is re-pulled, run the following:
cd $MODULES
libtbx.configure LS49 ls49_big_data uc_metrics lunus sim_erice xfel_regression
cd $BUILD
mk-cctbx cuda build # not make!!
cd $MODULES
libtbx.configure LS49 ls49_big_data uc_metrics lunus sim_erice xfel_regression
libtbx.refresh
To run some code, the following patch is needed in cctbx_project
:
--- a/xfel/merging/command_line/merge.py
+++ b/xfel/merging/command_line/merge.py
@@ -44,6 +44,7 @@ class Script(object):
def __init__(self):
self.mpi_helper = mpi_helper()
self.mpi_logger = mpi_logger()
+ self.common_store = dict(foo="hello") # always volatile, no serialization, no particular dict keys guaranteed
def __del__(self):
self.mpi_helper.finalize()
@@ -163,6 +164,7 @@ class Script(object):
# Perform phil validation up front
for worker in workers:
worker.validate()
+ worker.__dict__["common_store"] = self.common_store
self.mpi_logger.log_step_time("CREATE_WORKERS", True)
# Do the work