mega-non-model-wgs-snakeflow

Quick install and run

If you would like to put this on your system and test it running on a single node (more later about using SLURM for deployment across multiple nodes) you have to clone this repository and then download the pseudo-genome used for the included test data set (in .test).

You must have Snakemake (version > 6.0) in the active environment.

In short, here are the steps to install and run the .test.

# clone the repo
git clone [email protected]:eriqande/mega-non-model-wgs-snakeflow.git

# download the tarball with the genome in it and then move that
# into resources/
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1LMK-DCkH1RKFAWTR2OKEJ_K9VOjJIZ1b' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1LMK-DCkH1RKFAWTR2OKEJ_K9VOjJIZ1b" -O non-model-wgs-example-data.tar && rm -rf /tmp/cookies.txt

# untar the tarball
tar -xvf non-model-wgs-example-data.tar

# copy the genome from the extracted tarball into mega-non-model-wgs-snakeflow/resources/
cp non-model-wgs-example-data/resources/genome.fasta mega-non-model-wgs-snakeflow/resources/

Once that is set up, you can do a dry run like:

conda activate snakemake
cd mega-non-model-wgs-snakeflow

# set the number of cores you have access to, to use in the
# following command.  Here I have 12.  You should set yours
# however is appropriate
CORES=12
snakemake --cores $CORES --use-conda --conda-frontend mamba -np

If that gives you a reasonable looking output (165 total jobs, lots of conda environments to be installed, etc.) then take the -np off the end of the command to actually run it:

snakemake --cores $CORES --use-conda --conda-frontend mamba

Installing all the conda packages could take a while (2–30 minutes, depending on your system). Once that was done, running all the steps in the workflow on this small data set required less than 4 minutes on 12 cores of a single node from UC Boulder’s SUMMIT supercomputer.

Condensed DAG for the workflow

Here is a DAG for the workflow on the test data in .test, condensed into an easier-to-look-at picture by the condense_dag() function in Eric’s SnakemakeDagR package.

What the user must do and values to be set, etc

Choose an Illuminaclip adapter fasta (in config)

Assumptions

Paired end

Things fixed or added relative to JK’s snakemake workflow

fastqc on both reads
don’t bother with single end
add adapters so illumina clip can work
benchmark each rule
use genomicsDBimport
allow for merging of lots of small scaffolds into genomicsDB

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.test		.test
README_files		README_files
config		config
resources/adapters		resources/adapters
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.Rmd		README.Rmd
README.md		README.md
mega-non-model-wgs-snakeflow.Rproj		mega-non-model-wgs-snakeflow.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mega-non-model-wgs-snakeflow

Quick install and run

Condensed DAG for the workflow

What the user must do and values to be set, etc

Assumptions

Things fixed or added relative to JK’s snakemake workflow

About

Releases

Packages

Languages

License

mcaitlinv/mega-non-model-wgs-snakeflow

Folders and files

Latest commit

History

Repository files navigation

mega-non-model-wgs-snakeflow

Quick install and run

Condensed DAG for the workflow

What the user must do and values to be set, etc

Assumptions

Things fixed or added relative to JK’s snakemake workflow

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages