-
Notifications
You must be signed in to change notification settings - Fork 0
Jarv's Guide to the HPC
Jarvist Moore Frost, 2012-2013 (& minor updates 2015)
I wrote a little script launch_coms.sh
, which generates a .sh job run file for all supplied .com files, choosing memory + CPUS + queue from command line options, and dropping results files back to current directory. It copies a suitably named .chk
out, reformats it in case it is from Gaussion03, runs the job, then copies back both .log and .chk
files.
You might need to customise it to do special things (i.e. rename the fort.7
output and copy it home), if so, be careful that you are running the correct version of it - you'll probably need to explicitly ./launch_coms.sh
for the local copy.
I rather quickly hacked my Gaussian job runner to work with Gromacs .trj
files in Autumn 2011, now that Gromacs load balances with domain decomposition. These are often generated locally, and then uploaded to the HPC.
See launch_tpr.sh
Nb: The following describes Gromacs 4.x.x
; the newer Gromacs 5
uses a different workflow, and has merged a lot of analysis tools together!
Here's the general work flow chart: http://manual.gromacs.org/online/flow.html
You can now, with domain decomposition, get a working trj
file (with compatible, recent, version) on your local machine + run it for a bit of MD to check that everything works, then copy it over to the HPC to run it under the batch scheduler.
Have a look at https://github.com/jarvist/filthy-dotfiles/blob/master/ssh-config for defaults that allow you to do passwordless login.
HPC Home Directory Setup
On the HPC, my ~/.bash_profile
[nb: different name to the default on Debian/Ubuntu] looks like:
PATH=$PATH:/home/jmf02/bin
PS1="[\t]\u@\h:\w/ > "
alias ls="ls --color "
#Kick vim to give full colour, please
export TERM=xterm-256color
This makes it a more colourful experience, gives me a prompt that is very useful when I'm switching between virtual terminals and figuring out when I was last talking to this screen / gives me a path to scp files back + forwards etc.
The queues are round robin fair queues. This means that the person who has recently had a lot of time / jobs run is automatically demoted to the end of the queue. Therefore if someone has queued up hundreds of jobs, it doesn't matter as you will skip to the front of the queue and run as soon as a machine is available.
/work/username
is your fast scratch directory, typically with about 239G of space. This isn't much, if you're keeping Checkpoints from Gaussian or large MD runs. It is not backed up
As such, I have a TB drive on my local machine to which I 'rsync' all my calculations, and then selectively delete .chks
on the HPC when I'm not using them immediately.
The locally run script which pulls down the files from home and work looks like:
rsync -av [email protected]:/work/jmf02/ hpc_work
rsync -av [email protected]:/home/jmf02/ hpc_home
As well as the standard qstat
and qsub
, other q- programs I have found useful include:
qorder
: rotate order of your jobs... allows you to move a quick job to the front of your own queue (i.e. for when you have hundreds of jobs stacked up, but want to run something short today).
qmove
: allows you to bump jobs from queue to queue. Either from public queues to private when they become available / are running faster than expected, or from private to public to clear the space!
Our private nodes (pqexss
) are cx1-14-36-1 and cx1-14-36-2.
These have 32GB Ram, 500GB /tmp local storage, and are 8-way Intel Xeon E5462 @ 2.80GHz (6MB cache).
If you want do some 'heavy lifting' local calculations, it is probably best to chain your way into one of them, rather than hammering login-0.
i.e. ssh cx1-14-36-1 -Y
(the -Y passes on your encrypted X-server credentials + tunnel so you can run graphics programs)
If you want to figure out who is running jobs / using the queue currently, you can ssh in and run 'top' to see what's going on.
> qstat -f | grep vnode
exec_vnode = (**cx1-14-36-1**:ncpus=8:mem=12083200kb)
> ssh cx1-14-36-1 -Y
Last login: Mon Mar 19 13:19:16 2012 from cx1-12-1-1.cx1.hpc.ic.ac.uk
[14:34:41]jmf02@cx1-14-36-1:~/ > **cd /tmp/pbs.[TAB]**
[14:37:06]jmf02@cx1-14-36-1:/tmp/pbs.952652.cx1b/ > **ls**
ener.edr md.log mpd2.console_jmf02_120327.143216 mpd2.logfile_jmf02_120327.143216 tmp.H3pNkn9kEA topol.tpr traj.trr
As you've logged in with X-forwarding, you can now load gromacs / gaussian to see the results of your calculation in real time (gaussview or ngmx).
module avail
with give you a list of all available programs and libraries.
Most programs that have been locally compiled will depend on module load intel-suite
for the accelerated maths libraries and Intel Xeon optimisations. Many programs (notably Gromacs) will also require module load mpi
even if you're not running them multiprocessor.
For instance, to use Babel, you will need to load module load openbabel intel-suite mpi
.
Many programs (in particular Gromacs) have multiple versions installed. The default one loaded will be listed as '(default)' in module avail
. Alternative versions can be loaded by e.g. module load gromacs/4.5.4
.