Skip to content

BasicUsage

Shaoguang Guo(Sky & Star) edited this page Oct 31, 2023 · 3 revisions

Introduction

ChinaSRC is a network of servers that are pooled together to maximize their computational capabilities for specific purposes — often for computationally-intensive requirements such as data processing, simulations and modeling of SKA precursor/ pathfinder.

For novice or first-time ChinaSRC users, ChinaSRC O&M Team prepared this basic step for guidance on how to start your ChinaSRC journey prior to running actual jobs.

  • Log in to their ChinaSRC accounts;
  • Perform file and folder transfers to (upload) and from (download) the ChinaSRC;
  • Use environment modules;
  • Manage their Anaconda environments and packages;
  • Create SLURM job scripts; and
  • Run and manage their SLURM jobs.

Account application

Logging In

After apply the aacount, the user can now log in to the ChinaSRC.

Web portal

🔗 http://chinasrc.shao.ac.cn:8882

Linux / Unix / MacOS / Windows PowerShell

Interactive Command

To log in to the ChinaSRC, use this command in your local machine's terminal:

$ ssh -p 20002 [email protected]

After successfully logging in, the ChinaSRC's welcome page will be displayed as following:

[username@workstation ~]$ 

ChinaSRC Layout

The ChinaSRC is composed of the following nodes (servers):

  • Login node
    • This is where users log in to the ChinaSRC. DO NOT run jobs or programs here.
  • Compute nodes
    • X86 nodes x 15. Every node has:
      • Intel(R) Xeon(R) Gold 5218 CPU @ 2.3GHz
      • 32 logical CPUs
      • 768 GB RAM
    • X86 nodes x 8. Every node has:
      • Intel(R) Xeon(R) Gold 6132 CPU @ 2.6GHz
      • 28 logical CPUs
      • 1TB RAM
    • ARM nodes x 10. Every node has:
      • Kunpeng 920 CPU @ 2.6GHz
      • 96 logical CPUS
      • 1TB RAM
    • GPU nodes
      • 1 Intel(R) Xeon(R) 2690 @ 2.6GHz with 4 NVIDIA Tesla V100, 256GB RAM, 28 cores
      • 1 Intel(R) Xeon(R) 6152 @ 2.3GHz with 4 NVIDIA Tesla V100 (NVLINK), 1TB RAM, 44 cores
      • 1 Intel(R) Xeon(R) 6140 @ 2.3GHz with 8 NVIDIA Tesla V100 (NVLINK 32GB) , 512GB RAM, 36 cores
      • 1 Intel(R) Xeon(R) 5320 @ 2.2GHz with 4 NVIDIA Ampere A40 (40GB), 512GB RAM, 36 cores

Storage Quotas

Currently, ChinaSRC have 5.1 PB storage and extended 6 PB storage in this year.

Each user has the following default storage quotas:

  • Home (/home/username): 50 GB
  • Group folders (/groups/group_name/home/share/ : No limitation

The ChinaSRC is regularly undergoing maintenance and streamlining operations, so this may change in the future with prior notice to users.

The home folder is intended for long-term data storage, while the group folders maybe last for a while, so jobs can be performed either in their home or group folders.

Uploading and Downloading Files

Linux / Unix / MacOS / Windows PowerShell

Remote file transfers via the terminal can be done using scp or rsync.

Here I recommend you to use rsync (Because the rsync will detects the difference between the source and destination files when the transfer begin, it will terminate if there is no change). All of the commands listed here should be done on the local computer for both upload and download operations.

Using rsync

In your computer, upload files with rsync using the following command:

$ rsync --rsh='ssh -p 20002' -avzu local_files [email protected]:/home/username/

Just remember add the specific PORT for the transfer

For more information about rsync and its options, refer to its manual pages using man rsync.

Modules and Environments

Modules allow program installations with different versions to be used without them interfering with each other, thus effectively keeping each version in a sandboxed environment. In other words, modules allow programs to be used in isolation from others which avoids possible incompatibilities and inconsistencies. However, it should be noted that the ChinaSRC Team is gradually doing away with modules in favor of Anaconda environments, but modules are still used for programs that are not available in the Anaconda repository (anaconda.org).

Before you going to using the module, please import the environment first using :

$ module use /home/software/modulefiles/

Module Commands

Modules have the format <module_name>/<version>, for example: wsclean/cpu-2.9.

List Available Modules

Without any argument, this command will list all available versions of all installed modules. When one or more module names are provided, the available versions for the modules are listed:

$ module avail [<module1/version> <module2/version> ...]

For example, running module avail without additional arguments will print the following example list of modules which is not exhaustive as it is constantly being updated:

$ module avail

----------------------------------------------------------- /home/software/modulefiles/ -----------------------------------------------------------
aocommon/arm-3.0                  duchamp/cpu-1.6.2                 lapack/arm-3.8.0                  pgplot/cpu-5.2
aocommon/cpu-3.0                  dysco/arm-1.1                     lapack/cpu-3.10.0-gcc-4.8.5       pgplot/cpu-5.2-gcc-4.8.5
aoflagger/arm-v2.12.1             dysco/cpu-1.1                     lapack/cpu-3.10.0-gcc-7.3.0       pgplot/cpu-5.2-gcc-7.3.0
aoflagger/cpu-3.0.0-gcc-4.8.5     dysco/cpu-1.2-gcc-4.8.5           lapack/cpu-3.8.0                  pgplot/gpu-5.2
aoflagger/cpu-3.0.0-gcc-7.3.0     erfa/cpu-1.5.0                    lapack/cpu-3.8.0-gcc-4.8.5        prefactor/arm-3.1-gcc-9.3.0
aoflagger/cpu-gui-v2.12.1         erfa/cpu-2.0.0-gcc-4.8.5          lapack/gpu-3.8.8                  prefactor/cpu-3.1-gcc-9.3.0
aoflagger/cpu-v2.12.1             erfa/cpu-2.0.0-gcc-7.3.0          libsla/arm-master                 python/arm-2.7.14
askapsoft/1.12.0                  EveryBeam/cpu-master-20210630     libsla/cpu-master                 python/cpu-2.7.14
boost/arm-1.65.1                  EveryBeam/cpu-master-gcc-7.3.0    lua/cpu-5.3.6-gcc-4.8.5           python/cpu-3.8.0-gcc-4.8.5-vast
boost/cpu-1.65.1                  factor/arm-1.4-gcc-9.3.0          lua/cpu-5.3.6-gcc-7.3.0           python/cpu-3.8.12-gcc-4.8.5
boost/cpu-1.76.0-gcc-4.8.5        factor/cpu-1.4-gcc-9.3.0          miriad/2007                       python/cpu-3.8.12-gcc-7.3.0
boost/cpu-1.76.0-gcc-7.3.0        fftw/arm-3.8.8                    miriad/cpu-2007                   python/cpu-3.9.2-gcc-9.3.0
casacore/arm-2.4.1                fftw/cpu-3.3.10-gcc-4.8.5         Montage/cpu-6.0                   python/gpu-2.7.14
casacore/cpu-2.4.1                fftw/cpu-3.3.10-gcc-7.3.0         mpich/cpu-2-1.5rc3                RTS/cpu-master
casacore/cpu-3.3.0-gcc-4.8.5      fftw/cpu-3.8.8                    mpich/cpu-3.2.1                   RTS/gpu-master
casacore/cpu-3.3.0-gcc-7.3.0      fftw/gpu-3.8.8                    mpich/cpu-3.2.1-gcc-4.8.5         sextractor/cpu-2.25.0
casacore/cpu-3.4.0-gcc-7.3.0      gcc/7.3.0                         mwa-reduce/arm-master             stilts/arm-3.1-4
cfitsio/arm-3450                  gcc/7.3.0-new                     mwa-reduce/cpu-master             stilts/cpu-3.1-4
cfitsio/cpu-3450                  gcc/9.3.0                         mwa-reduce/cpu-master-2021        swarp/arm-2.38.0
cfitsio/cpu-4.0.0-gcc-4.8.5       hdf5/arm-1.10.4                   mwa-reduce/cpu-master-2022        swarp/cpu-2.38.0
cfitsio/cpu-4.0.0-gcc-7.3.0       hdf5/cpu-1.10.4                   MWA_Tools/arm-mwa-sci             wcslib/arm-6.2
cfitsio/gpu-3450                  hdf5/cpu-1.10.4-gcc-7.3.0         MWA_Tools/cpu-mwa-sci             wcslib/cpu-6.2
chgcentre/arm-wsclean2.6          hdf5/cpu-1.12.1-gcc-7.3.0         MWA_Tools/cpu-mwa-sci-wsclean-2.9 wcslib/cpu-7.7-gcc-4.8.5
chgcentre/cpu-wsclean2.6          hdf5/cpu-1.13.1-gcc-4.8.5         MWA_Tools/mwa-sci                 wcslib/cpu-7.7-gcc-7.3.0
cmake/cpu-3.15.2-gcc-7.3.0        hdf5/cpu-1.13.1-gcc-7.3.0         MWA_Tools/mwa-sci.old             wcslib/gpu-6.2
cmake/cpu-3.15.2-gcc-7.3.0-new    hdf5/gpu-1.10.4                   openmpi/cpu-2.0.2                 wcstools/cpu-3.9.6
cmake/cpu-3.20.0                  Healpix/arm-heapy                 openmpi/cpu-4.0.1                 wsclean/arm-2.6
cmake/cpu-3.20.0-gcc-7.3.0        Healpix/cpu-f90                   openmpi/gpu-4.0.1                 wsclean/cpu-2.6
cotter/arm-master                 Healpix/cpu-f90-gcc-4.8.5         pal/cpu-0.9.8                     wsclean/cpu-2.9
cotter/cpu-4.6-gcc-4.8.5          Healpix/cpu-heapy                 pal/cpu-0.9.8-gcc-4.8.5           wsclean/cpu-2.9-gcc-7.3.0
cotter/cpu-4.6-gcc-7.3.0          Healpix/gpu-cxx                   pal/cpu-0.9.8-gcc-7.3.0           wsclean/cpu-3.0-gcc-7.3.0
cotter/cpu-master                 Healpix/gpu-f90                   pgplot/arm-5.2

-------------------------------------------- /opt/app/spack/share/spack/modules/linux-centos7-haswell ---------------------------------------------
autoconf-2.69-gcc-4.8.5-6k6kik7               isl-0.18-gcc-4.8.5-igs522o                    ncurses-6.2-gcc-4.8.5-tbpd5z4
autoconf-archive-2019.01.06-gcc-4.8.5-7rxz2yv isl-0.21-gcc-4.8.5-ikicpxe                    perl-5.30.2-gcc-4.8.5-uay4u7v
automake-1.16.2-gcc-4.8.5-ipyg4ha             libiconv-1.16-gcc-4.8.5-qazxaa4               pkgconf-1.6.3-gcc-4.8.5-2qrpgpd
binutils-2.34-gcc-4.8.5-2csi6vr               libsigsegv-2.12-gcc-4.8.5-ymriiur             pkgconf-1.7.3-gcc-4.8.5-z3r4unw
bzip2-1.0.8-gcc-4.8.5-ersrl36                 libtool-2.4.6-gcc-4.8.5-fzl2npj               readline-8.0-gcc-4.8.5-3jeiguw
diffutils-3.7-gcc-4.8.5-jknorwe               libxml2-2.9.10-gcc-4.8.5-3foymu4              tar-1.32-gcc-4.8.5-v3iynan
gcc-10.1.0-gcc-4.8.5-2new4ox                  m4-1.4.18-gcc-4.8.5-7x2wh2t                   texinfo-6.5-gcc-4.8.5-fjg3jyt
gcc-7.5.0-gcc-4.8.5-of6wn6o                   mpc-1.1.0-gcc-4.8.5-g6zd7ob                   xz-5.2.5-gcc-4.8.5-rcyjfkv
gdbm-1.18.1-gcc-4.8.5-7xh2soi                 mpc-1.1.0-gcc-4.8.5-kv3zuys                   zlib-1.2.11-gcc-4.8.5-pkmj6e7
gettext-0.20.2-gcc-4.8.5-kapb6qj              mpfr-3.1.6-gcc-4.8.5-nol4vkt                  zstd-1.4.5-gcc-4.8.5-3boiaus
gmp-6.1.2-gcc-4.8.5-zn55wh7                   mpfr-4.0.2-gcc-4.8.5-kluqbcj

------------------------------------------ /opt/app/spack/share/spack/modules/linux-centos7-cascadelake -------------------------------------------
autoconf-2.69-gcc-10.1.0-2c3fdjr               libpciaccess-0.13.5-gcc-10.1.0-lw54lde         openmpi-2.1.6-gcc-10.1.0-tvthe74
autoconf-archive-2019.01.06-gcc-10.1.0-z7nw2bb libsigsegv-2.12-gcc-10.1.0-cw7jinv             openmpi-3.1.6-gcc-10.1.0-2mcmstt
automake-1.16.2-gcc-10.1.0-nvemodk             libtool-2.4.6-gcc-10.1.0-w2dkpic               perl-5.30.2-gcc-10.1.0-vncpxeg
environment-modules-4.5.1-gcc-10.1.0-gfocl6n   libxml2-2.9.10-gcc-10.1.0-wfhddkv              pkgconf-1.7.3-gcc-10.1.0-uignu4o
gcc-10.1.0-gcc-10.1.0-3o3bvj2                  m4-1.4.18-gcc-10.1.0-twi7kfh                   readline-8.0-gcc-10.1.0-y5e4cch
gdbm-1.18.1-gcc-10.1.0-pdpplkc                 mpc-1.1.0-gcc-10.1.0-gc7nbvo                   tcl-8.6.8-gcc-10.1.0-bxxjpg6
gmp-6.1.2-gcc-10.1.0-lo2ohmr                   mpfr-4.0.2-gcc-10.1.0-ckcev3b                  util-macros-1.19.1-gcc-10.1.0-6ijs7w4
hwloc-1.11.11-gcc-10.1.0-ezt4eyv               ncurses-6.2-gcc-10.1.0-tsvqpzn                 xz-5.2.5-gcc-10.1.0-2zdmlkh
isl-0.21-gcc-10.1.0-g45gmhu                    numactl-2.0.12-gcc-10.1.0-ckp5im3              zlib-1.2.11-gcc-10.1.0-yor3u7m
libiconv-1.16-gcc-10.1.0-lpidrg4               openmpi-2.0.0-gcc-10.1.0-65nxxvq               zstd-1.4.5-gcc-10.1.0-rtzjnll

--------------------------------------------------------- /usr/share/Modules/modulefiles ----------------------------------------------------------
dot         module-git  module-info modules     null        use.own

---------------------------------------------------------------- /etc/modulefiles -----------------------------------------------------------------
mpi/mpich-3.0-x86_64 mpi/mpich-3.2-x86_64 mpi/mpich-x86_64

-------------------------------------------------------------- /opt/app/modulefiles ---------------------------------------------------------------
mpich/3.0.4         mpich/3.2           openmpi/4.0.4/gcc   openmpi/4.0.4/intel openmpi/4.1.4/gcc   singularity/3.8.7

On the other hand, when using the command module avail wsclean for example, the available versions of the wsclean module are listed:

$ module avail wsclean

----------------------------------------------------------- /home/software/modulefiles/ -----------------------------------------------------------
wsclean/arm-2.6           wsclean/cpu-2.6           wsclean/cpu-2.9           wsclean/cpu-2.9-gcc-7.3.0 wsclean/cpu-3.0-gcc-7.3.0

Loading MWA software

$ module use /home/software/modulefiles
$ module load MWA_Tools/cpu-mwa-sci-wsclean-2.9
  • cotter / mwa-reduce / wsclean

Loading ASKAP software

$ module use /home/software/modulefiles
$ module load askapsoft/1.12.0
  • mslist/ readms / smear...

Anaconda

Anaconda is a package and environment manager written primarily in Python. Its official website is anaconda.org.

Initialize conda

$ source /opt/app/anaconda3/bin/activate
$ conda activate  # activate base env

Manage Environments

Create Environments

Caution

Creating environments may significantly use computational resources which is not allowed in the login node. This operation should be performed in a compute node. Therefore, the commands discussed here should be submitted as a SLURM job. Refer to the next SLURM section on how to submit a job.

Default Way

To create an Anaconda environment, simply use the following command template:

$ conda create --name magnetism python=3.7
# But on ChinaSRC using the command:
$ srun -N 1 -p hw-32C768G --comment=group_name conda create --name magnetism python=3.7

Activate an Environment

To activate an environment, use the following command template:

$ conda activate <env_name|env_path>

SLURM

SLURM is the job and resource manager used in the ChinaSRC. Its online documentation is at https://slurm.schedmd.com/documentation.html.

Job Parameters

Required Parameters

These are the job parameters that are required prior to running any job:

  • --comment: (string) group account where job quotas are set;
  • --partition: (string) which partition the job will be submitted to;
  • --nodes: (integer) number of nodes to request;
  • --ntasks: (integer) total number of CPUs to request;
  • --output: (string) job log file

Optional Parameters

On the other hand, these are some of the optional job parameters:

  • --ntasks-per-node: (integer) specify the number of CPUs per node to be requested (must not contradict --ntasks if also specified);

  • --mem: (string) memory per node (e.g., 40GB, 80GB, etc.);

  • --job-name: (string) name for the job; will be displayed in job monitoring commands (as discussed later);

  • --error: (string) job error file; recommended to not define this parameter and use only --output instead;

  • --requeue: (no arg) make job eligible for requeue;

For other parameters or more info regarding the above listed parameters, see the sbatch manual using the following command or go to the online manual.

Job Script

A job script is submitted to allocate resources for a job. The previously discussed job parameters and the commands to be used to run the job are placed here.

Here is a sample job script named job.sbatch where comments have been included to describe what each block does:

#!/bin/bash
#SBATCH --account=<slurm_group_acct>
#SBATCH --partition=<partition>
#SBATCH --nodes=<num_nodes>
#SBATCH --ntasks=<num_cpus>
#SBATCH --job-name="<jobname>"
#SBATCH --output="%x.out"         ## <jobname>.<jobid>.out
##SBATCH --ntasks-per-node=1      ## optional
##SBATCH --mem=24G                ## optional: mem per node
##SBATCH --error="%x.%j.err"      ## optional; better to use --output only

your_program_here

Job Management

Submit Job Script

It is recommended to submit the job inside the folder containing the job script. It is also recommended that any and all input and/or output files be within the same folder where the job script is located. This is to avoid changing working directories which may result in confusion and possible errors in accessing files/folders. For example, if the job folder is at /home/username/myjob where all the necessary input files are stored together with the job script named job.sbatch:

$ cd /home/username/myjob/
$ sbatch job.sbatch

Show Job Queue

If no argument is passed, all jobs in the queue will be displayed.

$ squeue [-u <username> ] [-p <partition>] [-w <nodelist>]

Show Job Parameters

$ scontrol show job <job_id> 

Check Node and/or Partition Status

$ sinfo [-p <partition> | -n <nodelist>]

Cancel Job(s)

You may only cancel jobs created under your account.

$ scancel <job_id1> [<job_id2> ...]