RNA Seq Analysis Pipeline for F. Nadeu

This is a pipeline written in Python and bash, and with Snakemake as a workflow manager, that will output kallisto files (abundances of transcripts) from RNA-Seq data. These samples can be in .fastq or compressed format.

The samples can be placed in any directory, but the path must be specified in the config/config.yaml file.

Copy the URL for the repository. To clone the repository using HTTPS, under "HTTPS", copy the link provided.
Open a Terminal.
Change the current working directory to the location where you want the cloned directory. For example, cd rna_seq_fnadeu. Make sure that the directory exists before you move into it.
Type git clone [email protected]:lymphIDIBAPS/rnaseq_fnadeu.git.
Press Enter to create your local clone.

git clone [email protected]:lymphIDIBAPS/rnaseq_fnadeu.git
> Cloning into `rna_seq_fnadeu`...
> remote: Counting objects: 10, done.
> remote: Compressing objects: 100% (8/8), done.
> remove: Total 10 (delta 1), reused 10 (delta 1)
> Unpacking objects: 100% (10/10), done.

Snakemake Usage

When we have the cloned repository, we can proced and add our sample data to the FASTQ directory. This is not mandatory, as in config/config.yaml file we can edit and set any path to our sample data.

In the same file we can edit the number of threads our computer has, so it will run adapted to the current resources we have available.

The rulegraph for our pipeline at date 25/10 is the following:

# For a test run of the pipeline
snakemake --use-conda -np

# For a real run of the pipeline
snakemake --use-conda

Run the pipeline in a HPC

If we have many samples and our computer does not have enough computational power, we can run the pipeline in a cluster. This pipeline has been prepared to run in the StarLife cluster, in the BSC.

Make a new directory named /slgpfs/ in your computer and mount it to the same directory in StarLife:

mkdir /home/user/slgpfs
sshfs -o allow_other [email protected]:/slgpfs/ /home/user/slgpfs/

This will allow you to see and work on the custer from your computer system directly.

On your computer, navigate to the directory: /home/user/slgpfs/projects/group_folder
Download and extract the following file to the directory, in which we have a full conda environment ready to run snakemake: Snakemake Conda Environment
Clone this repository in the directory, following the steps from Clone the repository
Now, connect to the cluster:

ssh [email protected] # or
ssh [email protected]

In the cluster, navigate to the cloned repository: /slgpfs/projects/group_folder/rna_seq_fnadeu
Now, activate the snakemake_bsc environment:

source ../snakemake_bsc/bin/activate

In your terminal, you should now see something like: (snakemake_bsc) your_username@sllogin1

At this point, in your local machine, you can move your samples to the directory /slgpfs/projects/group_folder/rna_seq_fnadeu/FASTQ.
Now, you can run the pipeline from the cluster with the command:

# For a test run of the pipeline
snakemake --profile config/slurm/ --use-envmodules -np

# For a real run of the pipeline
snakemake --profile config/slurm/ --use-envmodules

This command above will run the pipeline with the pipeline configuration from the file located in /slgpfs/projects/group_folder/rna_seq_fnadeu/config/config.yaml. Be sure to check and modify the configuration file to alter the pipeline with your desired options.

The cluster configuration file is located in /home/oscar/rnaseq/config/slurm/config.yaml. Below you have all the options available to customize your cluster run.

Configuration of the pipeline

General Configuration

Pep File: path to a .yaml file, containing the PEP file info and additional metadata about our project.
Working directory: the directory where our analysis will be run.
Path to data and pipelines folder: where our data and other resources are located in the cluster.
Perform T-trimming: cut 1 base from the start of the read in trimmomatic. Yes or no, with default = no.
Adapters: adapters to be removed by trimmomatic, you can choose between illumina or bioskryb; default = illumina
Analysis name: which name you want your analysis to have
Remove fastqs: remove intermediate fastqs after QC; yes or no, default = no
Remove bams: remove bam files after QC; yes or no, default = no
rRNA database for sortmerna: which of the rRNA databases do you wish to use; fast or sensitive or default, default = default
Index file for kallisto: do ypu want to include only cDNA, ncDNA or both in the index; cDNA or ncRNA or both, default = cDNA
runQC: should BAM files be created and QC metrics done; yes or no, default = yes
Number of cpus per job: for some rules, specify the amount of cpus to use; default = 20
Transcription strand for rules kallisto and collectRNASeqMetrics; first, second, unstranded, default = first

Cluster Configuration

Remember to check the files in /config/slurm/config.yaml for the cluster configuration. Review all the items and in case something is not clear you can check in this website what each term means in the configuration.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Author

Developed by @obaeza16, based on a pipeline written by Ferran Nadeu.

Mantained by Lymphoid neoplasms program, IDIBAPS for Ferran Nadeu.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
config		config
workflow		workflow
.gitignore		.gitignore
readME.md		readME.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA Seq Analysis Pipeline for F. Nadeu

Table of Contents

Installation

Linux

Install Miniconda 3

Windows

Set up your Linux user info

Install Miniconda 3

macOS

Snakemake Environment

Clone the repository

Snakemake Usage

Run the pipeline in a HPC

Configuration of the pipeline

General Configuration

Cluster Configuration

Contributing

Author

License

About

Releases

Packages

Languages

lymphIDIBAPS/rnaseq_fnadeu

Folders and files

Latest commit

History

Repository files navigation

RNA Seq Analysis Pipeline for F. Nadeu

Table of Contents

Installation

Linux

Install Miniconda 3

Windows

Set up your Linux user info

Install Miniconda 3

macOS

Snakemake Environment

Clone the repository

Snakemake Usage

Run the pipeline in a HPC

Configuration of the pipeline

General Configuration

Cluster Configuration

Contributing

Author

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages