Skip to content

Commit

Permalink
added test files and readme for gfastats
Browse files Browse the repository at this point in the history
  • Loading branch information
gf777 committed Sep 20, 2023
1 parent 4c91df5 commit f87daaf
Show file tree
Hide file tree
Showing 13 changed files with 214 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

.DS_Store
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ Research Assistant Professor, The Rockefeller University
Cassidy Johnson
Graduate Fellow, The Rockefeller University

Jack Medico
Graduate Fellow, The Rockefeller University

## Description

By the end of this session you will be able to:
Expand All @@ -31,3 +34,8 @@ By the end of this session you will be able to:
Please also read the description carefully to see if this session is relevant to you.

If you don't meet the prerequisites or change your mind based on the description or are no longer available at the session time, please email tol-training at sanger.ac.uk to cancel your slot so that someone else on the waitlist might attend.

## Training material

Gfastats examples can be found [here](https://github.com/BGAcademy23/gfastar/docs/gfastats.md).
Kreeq examples can be found [here](https://github.com/BGAcademy23/gfastar/docs/kreeq.md).
85 changes: 85 additions & 0 deletions docs/gfastats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
Help:
`gfastats -h`
File:
cat testFiles/random1.fasta `
Summary statistics:
`gfastats testFiles/random1.fasta`
Tabular output:
`gfastats testFiles/random1.fasta -t`
Change locale:
`gfastats large_input.fasta.gz --locale en_US.UTF-8`
Full output:
`gfastats testFiles/random1.fasta --nstar-report`
Report by sequence:
`gfastats testFiles/random1.fasta --seq-report`
Original file:
`gfastats testFiles/random1.fasta -ofa`
Line length:
`gfastats testFiles/random1.fasta -ofa --line-length 2`
Subset:
`gfastats testFiles/random1.fasta Header2 -ofa`
Subset with bed:
`gfastats testFiles/random1.fasta -e <(echo Header2) -ofa`
cat testFiles/random1.fasta.bed`
`gfastats testFiles/random1.fasta -ofa -e testFiles/random1.fasta.bed`
`gfastats testFiles/random1.fasta -ofa -i testFiles/random1.fasta.bed`
Size of components:
`gfastats testFiles/random1.fasta -s s`
`gfastats testFiles/random1.fasta -s c`
`gfastats testFiles/random1.fasta -s g`
AGP:
`gfastats testFiles/random1.fasta -b a`
BED coordinates:
`gfastats testFiles/random1.fasta -b s`
`gfastats testFiles/random1.fasta -b c`
`gfastats testFiles/random1.fasta -b g`
Sorting:
`gfastats testFiles/random1.fasta -ofa --sort largest`
`gfastats testFiles/random1.fasta -ofa --sort descending`
`gfastats testFiles/random1.fasta -ofa --sort test.sort`
GFA2:
`gfastats testFiles/random1.gfa2 -o gfa2`
GFA2 to FASTA conversion:
`gfastats testFiles/random1.gfa2 -o fasta`
GFA2 to GFA1 conversion:
`gfastats testFiles/random1.gfa2 -o gfa`
GFA1:
`gfastats testFiles/random2.gfa -o gfa`
GFA1 to FASTA:
`gfastats testFiles/random2.gfa -o fasta`
GFA1 to GFA2:
`gfastats testFiles/random2.gfa -o gfa2`
GFA1 no sequence:
`gfastats testFiles/random2.noseq.gfa -o gfa`
GFA1 no sequence:
`gfastats testFiles/random2.noseq.gfa -o fa`
Homopolymer compression:
`gfastats testFiles/random1.fasta --homopolymer-compress 1 -ofa`
Find terminal overlaps:
`gfastats testFiles/random5.findovl.gfa -ogfa`
`gfastats testFiles/random5.findovl.gfa --discover-terminal-overlaps 3 -ogfa`
Discover paths:
`gfastats testFiles/random1.fasta -ogfa | grep -v "^P" > test.gfa`
`gfastats test.gfa -ogfa`
`gfastats test.gfa -ogfa2 --discover-paths`
Superimpose AGP:
`gfastats testFiles/random1.fasta -a testFiles/random1.agp -ofa`
SAK reverse complement:
`cat testFiles/random1.rvcp.sak`
`gfastats testFiles/random1.fasta -ofa`
`gfastats testFiles/random1.fasta -k testFiles/random1.rvcp.sak -ofa`
Other SAK instructions:
`cat testFiles/random1.instructions.sak`
`gfastats testFiles/random1.fasta -ofa`
`gfastats testFiles/random1.fasta -ofa -k <(head -1 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ofa -k <(head -2 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ofa -k <(head -3 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ofa -k <(head -4 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ogfa2 -k <(head -4 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ofa -k <(head -5 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ogfa2 -k <(head -5 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ofa -k <(head -6 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ogfa2 -k <(head -6 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ogfa2 -k <(head -6 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ogfa2 -k <(head -7 testFiles/random1.instructions.sak)`
`gfastats testFiles/random1.fasta -ofa -k <(head -8 testFiles/random1.instructions.sak)`
28 changes: 28 additions & 0 deletions docs/kreeq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
```
kreeq validate -f input.[fasta|fastq][.gz] -r reads1.fastq[.gz] reads2.fastq[.gz] [...] [-k 21]
```

It accepts multiple read files as input, separated by space. To check out all options and flags use `kreeq -h`.

You can test some typical usage with the files in the `testFiles` folder, e.g.:

```
kreeq validate -f testFiles/random1.fasta -r testFiles/random1.fastq
```

Importantly, the kreeq database can only be computed once on the read set, and reused for multiple analyses to save runtime:

```
kreeq validate -r testFiles/random1.fastq -o db.kreeq
kreeq validate -f testFiles/random1.fasta -d db.kreeq
```

Similarly, kreeq databases can be generated separately for multiple inputs and combined, with increased performance in HPC environments:

```
kreeq validate -r testFiles/random1.fastq -o random1.kreeq
kreeq validate -r testFiles/random2.fastq -o random2.kreeq
kreeq union -d random1.kreeq random2.kreeq -o union.kreeq
kreeq validate -f testFiles/random1.fasta -d union.kreeq
```
8 changes: 8 additions & 0 deletions docs/testFiles/random1.agp
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
newpath1 1 5 1 W Header1 2 5 +
newpath1 6 10 2 N 5 scaffold yes
newpath1 11 13 3 W Header2 1 3 -
newpath1 14 18 4 N 5 scaffold yes
newpath1 19 24 5 W Header3 4 8 +
newpath2 1 5 1 W Header5 3 7 -
newpath2 6 10 2 N 5 scaffold yes
newpath2 11 25 3 W Header4 1 15 +
15 changes: 15 additions & 0 deletions docs/testFiles/random1.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
>Header1 5bp sequence with no gaps and 2 lowercase bases
CGa
cT
>Header2 5bp sequence with internal 1bp non-canonical gap
CG
AXT
>Header3 10bp sequence with internal 4bp and 1bp terminal canonical gap
TGANA
TNCTN
>Header4 15bp sequence with start 3bp canonical gap and 3 lowercase bases
NNNTTCC
TcgCACtC
>Header5 15bp sequence with terminal 3bp canonical gap
AACTCGAT
CACGNNN
8 changes: 8 additions & 0 deletions docs/testFiles/random1.fasta.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Header1 0 5
Header2 0 3
Header2 4 5
Header3 0 3
Header3 4 6
Header3 7 9
Header4 2 13
Header5 3 14
20 changes: 20 additions & 0 deletions docs/testFiles/random1.gfa2
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
H VN:Z:2.0
S Header1.1 5 CGacT
S Header2.1 3 CGA
S Header2.3 1 T
S Header3.1 3 TGA
S Header3.3 2 AT
S Header3.5 2 CT
S Header4.2 12 TTCCTcgCACtC
S Header5.1 12 AACTCGATCACG
G Header2.2 Header2.1+ Header2.3+ 1
G Header3.2 Header3.1+ Header3.3+ 1
G Header3.4 Header3.3+ Header3.5+ 1
G Header3.6 Header3.5+ Header3.5- 1
G Header4.1 Header4.2+ Header4.2+ 3
G Header5.2 Header5.1+ Header5.1- 3
O Header1 Header1.1+ 5bp sequence with no gaps and 2 lowercase bases
O Header2 Header2.1+ Header2.2 Header2.3+ 5bp sequence with internal 1bp non-canonical gap
O Header3 Header3.1+ Header3.2 Header3.3+ Header3.4 Header3.5+ Header3.6 10bp sequence with internal 4bp and 1bp terminal canonical gap
O Header4 Header4.1 Header4.2+ 15bp sequence with start 3bp canonical gap and 3 lowercase bases
O Header5 Header5.1+ Header5.2 15bp sequence with terminal 3bp canonical gap
8 changes: 8 additions & 0 deletions docs/testFiles/random1.instructions.sak
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
JOIN Header1+ Header2+ 5 newGap1 Scaffold1
JOIN Header4+ Header5+ 5 newGap2 Scaffold2
JOIN Scaffold1+ Header3+ 10 newGap3 FinalScaffold
SPLIT Header2.1 Header2.3 Scaffold3 Scaffold4
EXCISE Header3.3 3 newGap4
INVERT Header5.1
REMOVE Header1.1
RESIZE newGap2 10
2 changes: 2 additions & 0 deletions docs/testFiles/random1.rvcp.sak
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
RVCP Header4
RVCP Header3
11 changes: 11 additions & 0 deletions docs/testFiles/random2.gfa
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
H VN:Z:1.2
S 11 ACCTT LN:i:5 QL:Z:?@97?
S 12 TCAAGG LN:i:6 QL:Z:@6?84@
S 13 CTTgaTT LN:i:7 QL:Z:>=?@877
L 11 + 12 - 4M
L 12 - 13 + 5M
L 11 + 13 + 3M
J 11 + 13 - 5 SC:i:1
J 13 - 12 + 3 SC:i:1
P 14 11+;13-;12+ 5,3
P 15 11+,12-,13+ 4M,5M
11 changes: 11 additions & 0 deletions docs/testFiles/random2.noseq.gfa
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
H VN:Z:1.2
S 11 * LN:i:5 QL:Z:?@97?
S 12 * LN:i:6 QL:Z:@6?84@
S 13 * LN:i:7 QL:Z:>=?@877
L 11 + 12 - 4M
L 12 - 13 + 5M
L 11 + 13 + 3M
J 11 + 13 - 5 SC:i:1
J 13 - 12 + 3 SC:i:1
P 14 11+;13-;12+ 5,3
P 15 11+,12-,13+ 4M,5M
8 changes: 8 additions & 0 deletions docs/testFiles/random5.findovl.gfa
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
H VN:Z:1.2
S 11 CCGTTCCATGAAGGCCAGAGTTACTTACCGGCCCTTTCCATGCGCGCGCCATAAA LN:i:55
S 12 GATTTAAGAATATGTTAACGGAGGATTGCACGATCTTCTCTCCTCGTGAGAGAATTTATG LN:i:60
S 13 AAATCGCATAGCTATGTATTTTGCAGAGGTAGCGACATCTTGACGGGCACTTCACAGATAGTGGG LN:i:65
J 11 + 13 - 5 SC:i:1
J 13 - 12 + 3 SC:i:1
P 14 11+;13-;12+ 5,3
P 15 11+,12-,13+ 6M,5M

0 comments on commit f87daaf

Please sign in to comment.