From b02a5c2f81746b11e6877c564a6e6b4a18c19ee8 Mon Sep 17 00:00:00 2001 From: Cassidy Johnson <114778057+cassidy-a-johnson@users.noreply.github.com> Date: Thu, 21 Sep 2023 23:15:02 -0400 Subject: [PATCH] Update kreeq.md --- docs/kreeq.md | 44 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/docs/kreeq.md b/docs/kreeq.md index f27b375..79126ce 100644 --- a/docs/kreeq.md +++ b/docs/kreeq.md @@ -1,26 +1,32 @@ -Let's get some test files first: -``` -mv testFiles-kreeq/* . -``` +The standard notation for using kreeq is as follows: ``` kreeq validate -f input.[fasta|fastq][.gz] -r reads1.fastq[.gz] reads2.fastq[.gz] [...] [-k 21] ``` -It accepts multiple read files as input, separated by space. To check out all options and flags use: - -`kreeq -h` +It accepts multiple read files as input, separated by space. The two modes we will be using today are `validate` and `union`. +To check out all options and flags use: +``` +kreeq -h +kreeq validate -h +kreeq union -h +``` -You can test some typical usage with the files in the `testFiles` folder, e.g.: +Let's get some test files first: +``` +mv gfastar/docs/testFiles-kreeq/* . +``` +We will test some typical usage with the files moved from the `testFiles` folder, e.g.: ``` kreeq validate -f random1.fasta -r random1.fastq +kreeq validate -f random2.fasta -r random1.fastq random2.fastq ``` Importantly, the kreeq database can only be computed once on the read set, and reused for multiple analyses to save runtime: ``` -kreeq validate -r random1.fastq -o db.kreeq -kreeq validate -f random1.fasta -d db.kreeq +kreeq validate -r random1.fastq -o random1.kreeq +kreeq validate -f random1.fasta -d random1.kreeq ``` Similarly, kreeq databases can be generated separately for multiple inputs and combined, with increased performance in HPC environments: @@ -32,3 +38,21 @@ kreeq validate -r random2.fastq -o random2.kreeq kreeq union -d random1.kreeq random2.kreeq -o union.kreeq kreeq validate -f random1.fasta -d union.kreeq ``` + +Now working with real sequencing data: + +Let's start by running `gfastats` to get a sense of what we are evaluating. +``` +gfastats input.fa +``` + +Now we are ready to run kreeq: +``` +kreeq validate -r filtered.fastq -o filtered.kreeq +kreeq validate -r filtered2.fastq -o filtered2.kreeq + +kreeq union -d filtered.kreeq filtered2.kreeq -o filtered_union.kreeq + +kreeq validate -f input.fa -d filtered_union.kreeq +``` +```