Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Beginner's Guide to Installation

Avi Ramu edited this page Nov 24, 2015 · 42 revisions

##Introduction This tutorial is part of a Beginner's Guide series that will walk a new user through the process of installing the GMS. For a brief demonstration of the GMS please start with the: Quick Tour in a Pre-configured Virtual Machine. The simplest path to installing the sGMS (besides the Pre-configured Virtual Machine) is on a fresh Ubuntu (Precise) box. Discussion of additional installation options are also available on the Install page and the [Installation Types Overview] (https://github.com/genome/gms/wiki/Installation-Types-Overview) page.

##System Requirements Genome analysis is computationally intense and involves very large data files. Even the following demonstration analysis with down-sampled data will require considerable cpu, memory, and storage. That being said, we have been able to get the demonstration analysis to succeed on a 2013 MacBook Pro Laptop (OSX 10.9.2) within a VirtualBox virtual machine that was allocated: 3 cpus, 12 GB memory, and ~250 Gb of disk space. We recommend testing with resources greater than this if possible.

##Installing Ubuntu First, obtain an installation image for Ubuntu 12.04.4 LTS (PRECISE) from: http://releases.ubuntu.com/precise/

You will need to burn this image to a disk or copy to a usb device. You may need to configure your computer BIOS to allow booting from disk/usb.

A good graphical guide for installing ubuntu exists here: https://help.ubuntu.com/community/GraphicalInstall

A graphical guide for setting up your own partitions (as we will be doing) can be found here: http://askubuntu.com/questions/343268/how-to-use-manual-partitioning-during-installation

###Pre-installation options You will be given the following options by the Ubuntu installer:

  1. Download updates while installing - this is optional
  2. Install third-party software - this is optional
  3. Installation type - choose "Something else" (see "Setting up partitions")

###Setting up partitions If installing for first time, choose 'New Partition Table'. Otherwise change/edit/format as needed. A recommended set up might be configured as follows:

  • sda - three partitions:
    • sda1 - use as EFI boot (200MB)
    • sda2 - use as swap (64000MB)
    • sda3 - use as ext4; set mount point to "/"; choose this partition to install ubuntu; choose format option
  • sdb - single partition:
    • sdb1 - use as ext4; set mount point to "/tmp"; choose format option
  • sdc - single partition:
    • sdc1 - use as ext4; set mount point to "/opt/gms"; choose format option UNLESS you are reinstalling GMS and do not wish to re-download
  • Device for boot loader installation - choose sda

###Additional installation options

  1. Set name, computer name, username, and password as desired (take note of your password!)
  2. Do not encrypt home folder.

###First Ubuntu login If the installation above succeeds you will be prompted to eject the installation media and reboot your system. You should end up at an Ubuntu linux login screen. Login using the username and password you chose above.

Before installing the GMS we will install a few missing/recommended packages

  • git - The source code management system that we will use to download the GMS installer code
  • make - The make utility will be used for the actual installation of all necessary packages for GMS
  • ssh - Installing ssh will allow you to access this computer remotely (optional)
  • vim - A useful text editor (optional)
  • byobu - A useful utility to allow remote connection to terminal sessions (optional)

All of the above can be installed with the following single command (enter password if prompted)

sudo apt-get install git ssh make vim byobu

##Installing the Genome Model System (GMS)

###Downloading the GMS installer First, we will download the GMS installer. For non-development users do the following:

git clone https://github.com/genome/gms.git

If you wish to be able to commit to the GMS installer repo you will need to be given proper permissions and do the following:

git clone [email protected]:genome/gms.git

###Installing the GMS First, start a byobu session. This will allow you to reattach to this session later if necessary. Then enter the new gms git repo and run the Makefile to install.

byobu
cd gms
make

After make finishes successfully, reboot or send reboot signal (if working remotely):

sudo reboot

###Initial Sanity Checks The following checks can be made after logging into the GMS:

lsid                      # You should see the openlava cluster identification
lsload                    # You should see a report of available resources
bjobs                     # You should not have any unfinished jobs yet
bsub 'sleep 60'           # You should be able to submit a job to openlava (run bjobs again to see it)
bhosts                    # You should see one host
bqueues                   # You should see four queues
genome disk group list    # You should see four disk groups
genome disk volume list   # You should see at least one volume for your local drive
genome sys gateway list   # You should see two gateways, one for your new home system and one for the test data "GMS1"

###Your New System

Each GMS has a unique ID:

cat /etc/genome/sysid
echo $GENOME_SYS_ID

The entire installation lives in a directory with the ID embedded:

echo $GENOME_HOME # /opt/gms/$GENOME_SYS_ID

The initial system has one node, and that node has only its local disk on which to perform analysis.
To expand the system to multiple nodes, add disks, or use network-attached storage, see the documentation on System Expansion.

###Usage To install an example set of human cancer data, including reference sequences and annotation data sets, first log into the system, then move into the gms repo directory (e.g. cd /vagrant/gms or cd ~/gms), and finally run the following:

./setup/prime-system.pl --data=hcc1395_1tenth_percent --sync=tarball --low_resources --memory=Xgb

The example above will download a down-sampled demonstration data set. Valid values of data are hcc1395, hcc1395_1percent, hcc1395_1tenth_percent, and none. If you chose the full hcc1395 dataset consisting of whole genome, exome and transcriptome data for tumor and normal, this data set is 313 GB. It may consume considerable bandwidth and be very slow to install.

The example above assumes you are testing on a system with limited resources (--low_resources) and X GB of memory (--memory=Xgb). If you are on a large server you should drop these two options. Otherwise, specify the memory available on your system or within your VM if applicable. See ./setup/prime-system.pl --help for details.

# list the data you just imported
genome taxon list
genome individual list
genome sample list
genome library list
genome instrument-data list solexa
    
# list the pre-defined models (no results yet ... you will launch these and generate results)
genome model list
    
# list the processing profiles associated with those models
genome processing-profile list reference-alignment
genome processing-profile list somatic-variation
genome processing-profile list rna-seq
genome processing-profile list differential-expression    
genome processing-profile list clin-seq

###A note about re-installing the GMS During initial testing of the system you may find it necessary to start from scratch and re-install. To avoid re-downloading the data it is useful to keep '/opt/gms' on a separate partition and simply remount it during ubuntu installation.

Clone this wiki locally