Skip to content

Jarv's Guide to the HPC

Jarvist Moore Frost edited this page Jan 14, 2015 · 1 revision

Jarvist Moore Frost, 2012-2013 (& minor updates 2015)

Using Gaussian

Launching Gaussian Jobs

I wrote a little script launch_coms.sh, which generates a .sh job run file for all supplied .com files, choosing memory + CPUS + queue from command line options, and dropping results files back to current directory. It copies a suitably named .chk out, reformats it in case it is from Gaussion03, runs the job, then copies back both .log and .chk files.

You might need to customise it to do special things (i.e. rename the fort.7 output and copy it home), if so, be careful that you are running the correct version of it - you'll probably need to explicitly ./launch_coms.sh for the local copy.

Using Gromacs

Launching Gromacs Jobs

I rather quickly hacked my Gaussian job runner to work with Gromacs .trj files in Autumn 2011, now that Gromacs load balances with domain decomposition. These are often generated locally, and then uploaded to the HPC. See launch_tpr.sh

Using Gromacs

Nb: The following describes Gromacs 4.x.x; the newer Gromacs 5 uses a different workflow, and has merged a lot of analysis tools together!

Here's the general work flow chart: http://manual.gromacs.org/online/flow.html

You can now, with domain decomposition, get a working trj file (with compatible, recent, version) on your local machine + run it for a bit of MD to check that everything works, then copy it over to the HPC to run it under the batch scheduler.

Local Machine Setup

Have a look at https://github.com/jarvist/filthy-dotfiles/blob/master/ssh-config for defaults that allow you to do passwordless login.

HPC Home Directory Setup

On the HPC, my ~/.bash_profile [nb: different name to the default on Debian/Ubuntu] looks like:

PATH=$PATH:/home/jmf02/bin
PS1="[\t]\u@\h:\w/ > "

alias ls="ls --color "

#Kick vim to give full colour, please
export TERM=xterm-256color

This makes it a more colourful experience, gives me a prompt that is very useful when I'm switching between virtual terminals and figuring out when I was last talking to this screen / gives me a path to scp files back + forwards etc.

CX1: Running / Checking / Manipulating Jobs

A Note on the Queues

The queues are round robin fair queues. This means that the person who has recently had a lot of time / jobs run is automatically demoted to the end of the queue. Therefore if someone has queued up hundreds of jobs, it doesn't matter as you will skip to the front of the queue and run as soon as a machine is available.

A Note on Disk Space

/work/username is your fast scratch directory, typically with about 239G of space. This isn't much, if you're keeping Checkpoints from Gaussian or large MD runs. It is not backed up

As such, I have a TB drive on my local machine to which I 'rsync' all my calculations, and then selectively delete .chks on the HPC when I'm not using them immediately.

The locally run script which pulls down the files from home and work looks like:

rsync -av [email protected]:/work/jmf02/ hpc_work

rsync -av [email protected]:/home/jmf02/ hpc_home

qstat

As well as the standard qstat and qsub, other q- programs I have found useful include:

qorder: rotate order of your jobs... allows you to move a quick job to the front of your own queue (i.e. for when you have hundreds of jobs stacked up, but want to run something short today).

qmove: allows you to bump jobs from queue to queue. Either from public queues to private when they become available / are running faster than expected, or from private to public to clear the space!

PQEXSS

Our private nodes (pqexss) are cx1-14-36-1 and cx1-14-36-2.

These have 32GB Ram, 500GB /tmp local storage, and are 8-way Intel Xeon E5462 @ 2.80GHz (6MB cache).

If you want do some 'heavy lifting' local calculations, it is probably best to chain your way into one of them, rather than hammering login-0.

i.e. ssh cx1-14-36-1 -Y

(the -Y passes on your encrypted X-server credentials + tunnel so you can run graphics programs)

If you want to figure out who is running jobs / using the queue currently, you can ssh in and run 'top' to see what's going on.

Live Viewing Jobs

> qstat -f | grep vnode
   exec_vnode = (**cx1-14-36-1**:ncpus=8:mem=12083200kb)
> ssh cx1-14-36-1 -Y
Last login: Mon Mar 19 13:19:16 2012 from cx1-12-1-1.cx1.hpc.ic.ac.uk
[14:34:41]jmf02@cx1-14-36-1:~/ > **cd /tmp/pbs.[TAB]**
[14:37:06]jmf02@cx1-14-36-1:/tmp/pbs.952652.cx1b/ > **ls**
ener.edr  md.log  mpd2.console_jmf02_120327.143216  mpd2.logfile_jmf02_120327.143216  tmp.H3pNkn9kEA  topol.tpr  traj.trr 

As you've logged in with X-forwarding, you can now load gromacs / gaussian to see the results of your calculation in real time (gaussview or ngmx).

'Module Load' and missing libraries

module avail with give you a list of all available programs and libraries.

Most programs that have been locally compiled will depend on module load intel-suite for the accelerated maths libraries and Intel Xeon optimisations. Many programs (notably Gromacs) will also require module load mpi even if you're not running them multiprocessor.

For instance, to use Babel, you will need to load module load openbabel intel-suite mpi.

Program Versions

Many programs (in particular Gromacs) have multiple versions installed. The default one loaded will be listed as '(default)' in module avail. Alternative versions can be loaded by e.g. module load gromacs/4.5.4.