- Overview
- Background
- Before Starting
- Getting Started
- Software Requirements
- Architecture Design
- Data
- Funding
- License for Data
This module will introduce you to (graphical) pangenomics and walk you through a pangenomics pipeline. Specifically, you will learn how to build a pangenome graph, map reads to the graph, call variants on the mapped reads, and visualize the graph. All analyses will be performed on the Google Cloud Platform. The estimated cost for the complete module is $?
A pangenome is a collection of genomes from the same species. Compared to a reference genome, a pangenome is a less biased, more comprehensive representation of sequence preservation and variation within a population. While the pangenome may provide greater insight into questions related to the genetic and genomic nature of a species, these data require the use of bioinformatics tools that are different than those typically used on reference genomes. This module aims to introduce you to the idea of pangenome graphs and the bioinformatics tools used for their analysis.
This module is designed to run on the Google Cloud Platform (GCP). Follow the instructions below to prepare to run the module on GCP.
Setting up GCP
See the Vertex AI Quickstart instructions for details on steps 1-5.
- Create a Google Cloud account
- Create a Google Cloud project
- Enable billing for your Google Cloud project
- Create a Vertex AI Workbench instance
- Click "OPEN JUPYTERLAB" on your instance to open JupyterLab
- Clone this repository into JupyterLab
Installing Software
All software for this module is installed via Conda. To set up the module's Conda environment and install all the software, open a Terminal in JupyterLab (File -> New Launcher -> Terminal) and run the following command:
bash -i ./NIGMS-Sandbox-Pangenomics-Module/scripts/0-setup.sh
After the command complets, close the terminal and refresh the JupyterLab window in your web browser. There should now be a new kernal in the launcher called "conda-nigms-pangenomics". This is the kernel you should use with every notebook in the module.
To begin, we must understand how this repository is organized.
└── module_notebooks/
├── 00-environment-setup.ipynb
├── 01-intro-to-pangenomics.ipynb
├── 02-building-graphs-with-pggb.ipynb
├── 03-indexing-graphs-with-vg.ipynb
├── 04-read-mapping-with-vg.ipynb
├── 05-variant-calling-with-vg.ipynb
├── 06-searching-graphs-with-blast.ipynb
└── 07-visualization.ipynb
module_notebooks/
contains Jupyter notebooks - one for each submodule.
To open a notebook, simply double-click on it in your Workbench instance.
To begin this module, open the 00-environment-setup.ipynb
notebook.
This notebook will introduce you to Jupyter notebooks and instruct you on how to install the software for this module.
The follow software is required for this module and will be installed as part of the 00-environment-setup.ipynb
submodule: