Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ChewBBACA #5899

Merged
merged 100 commits into from
Apr 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
0d90128
add ChewBBACA
nilchia Mar 21, 2024
e31b5f9
update macros
nilchia Mar 21, 2024
9c403c7
add compare by sim_size for zip files
nilchia Mar 21, 2024
1c22e19
3rd tool PrepExternalSchema
nilchia Mar 22, 2024
b38e5c7
Change to min value of --l
nilchia Mar 25, 2024
f1b2d85
Add PrepExternalSchema tool from chewbbaca
nilchia Mar 25, 2024
d22ff0c
Some minor adjustment to xmls. create output of allelecall in test-data
nilchia Mar 25, 2024
87386a9
update test-data
nilchia Mar 25, 2024
22e2901
update test-data
nilchia Mar 25, 2024
698b1fd
add prepareexternalschema
nilchia Mar 25, 2024
cbc4c70
removing uploaded wrong files
nilchia Mar 25, 2024
5b94ea1
aading DownloadSchema tool
nilchia Mar 25, 2024
cef81ad
aading DownloadSchema tool
nilchia Mar 25, 2024
7468524
Add AlleleCallEvaluator
nilchia Mar 26, 2024
933f5fd
update alleleCallEvaluator
nilchia Mar 26, 2024
b7dc7c0
Add FastTree requirement to macros
nilchia Mar 26, 2024
67a7f4c
add AlleleCallEvaluator
nilchia Mar 26, 2024
ad1b55a
update AlleleCall.xml to write cds_coordinates as output
nilchia Mar 26, 2024
0256df7
Update AllelCallEvluator
nilchia Mar 26, 2024
0ca96db
update test.data
nilchia Mar 26, 2024
1092358
update test-data
nilchia Mar 26, 2024
0bd6e4f
add ExtractCgMLST.xml
nilchia Mar 27, 2024
58bd441
Merge branch 'galaxyproject:main' into chewbbaca
nilchia Mar 27, 2024
422a8a2
add NSStats.xml
nilchia Mar 27, 2024
9d1dc78
add JoinProfiles.xml
nilchia Mar 27, 2024
d867846
updata test-data
nilchia Mar 27, 2024
7fe39d5
final NSStats.xml
nilchia Mar 27, 2024
d67b1b7
Final NSStats.xml with help
nilchia Mar 27, 2024
2a34a6a
final DownloadSchema.xml
nilchia Mar 27, 2024
c97a0c0
final DownloadSchema with help and a little correction to NSStema.xml
nilchia Mar 27, 2024
15007cd
final CreatSchema.xml with help
nilchia Mar 27, 2024
2d5f915
final PrepExternalSchema.xml with help
nilchia Mar 27, 2024
49804ed
Final AlleleCall.xml with help
nilchia Mar 27, 2024
dec0a57
NSStats.xml reviewed
nilchia Mar 28, 2024
fcfc806
macros updated species_id
nilchia Mar 28, 2024
e5ed80e
revised CreateSchenma
nilchia Apr 2, 2024
d2fab38
revised CreateSchema with another test
nilchia Apr 2, 2024
4a280ba
revised AlleleCall.xml
nilchia Apr 2, 2024
c871034
revised AlleleCallEvaluator.xml
nilchia Apr 2, 2024
85b2a0d
revised DownloadSchema
nilchia Apr 2, 2024
cdfd121
revised ExtractCgMLST.xml
nilchia Apr 2, 2024
91a8efb
updated macros - INPUT as token
nilchia Apr 2, 2024
56a4bcb
Revised PrepExternalSchema.xml
nilchia Apr 2, 2024
44e8331
revised Joinprofile.xml
nilchia Apr 2, 2024
5c66c2c
Added --cds_input
nilchia Apr 2, 2024
08efc33
Added --cds_input
nilchia Apr 2, 2024
a7a4520
Added --cds-input
nilchia Apr 2, 2024
ca1f086
Added --cds-input
nilchia Apr 2, 2024
5b4a0b7
Added --genes-list
nilchia Apr 2, 2024
f39fec1
added --common
nilchia Apr 2, 2024
dd62428
update test-data
nilchia Apr 2, 2024
cf924b0
Merge branch 'galaxyproject:main' into chewbbaca
nilchia Apr 2, 2024
e06d66e
updated test of DownloadSchema.xml
nilchia Apr 3, 2024
b5d53ff
updated test-data
nilchia Apr 3, 2024
559ae10
updated test-data
nilchia Apr 3, 2024
3ff7677
update on test and output of ExtractCgMLST.xml
nilchia Apr 3, 2024
8008177
Update tools/chewbbaca/AlleleCall.xml
nilchia Apr 5, 2024
5f5f261
Update tools/chewbbaca/AlleleCall.xml
nilchia Apr 5, 2024
d7650b9
Update tools/chewbbaca/AlleleCall.xml
nilchia Apr 5, 2024
e1199b1
Update tools/chewbbaca/PrepExternalSchema.xml
nilchia Apr 5, 2024
533811b
Update tools/chewbbaca/AlleleCallEvaluator.xml
nilchia Apr 5, 2024
9c534c8
added test for 3 optional outputs of AllelCall.xml
nilchia Apr 5, 2024
9bb69d2
Update command line of JoinProfiles.xml
nilchia Apr 5, 2024
fd52c6e
corrected id of tools all lowercase with chewbbaca prefix
nilchia Apr 5, 2024
2173b85
corrected if statement for optional file
nilchia Apr 5, 2024
e50e9c6
change some params to multiple select type
nilchia Apr 5, 2024
4718a0d
added '' to all files and directories
nilchia Apr 5, 2024
c6acc9c
added '' to all files and directories
nilchia Apr 5, 2024
01170b2
better name for macros tokens
nilchia Apr 5, 2024
855121f
the result of AlleleCallEvluator is now a collection
nilchia Apr 5, 2024
eabe306
help is edited
nilchia Apr 5, 2024
edce444
the output is now an html file not a collection
nilchia Apr 8, 2024
fcca0f8
corrected output filtering in AlleleCall.xml
nilchia Apr 8, 2024
5dca753
update test-data
nilchia Apr 8, 2024
bcf0d83
--cds-input is boolean now
nilchia Apr 9, 2024
e011a57
added a test for --cds in CreateSchema.xml
nilchia Apr 9, 2024
93c7722
update test data
nilchia Apr 9, 2024
b7a76b9
--cds is boolean for AlleleCall.xml
nilchia Apr 9, 2024
6203ca7
added test for annotation file in AlleleCallEvaluator.xml
nilchia Apr 10, 2024
788e640
updateed help for AlleleCall.xml ExtractCgMLST.xml
nilchia Apr 10, 2024
1c51e77
update the name of test2 output
nilchia Apr 10, 2024
28219f5
update on 2nd test of AlleleCallEvaluator
nilchia Apr 10, 2024
4b45c8f
Update tools/chewbbaca/CreateSchema.xml
nilchia Apr 11, 2024
b669c99
changed test of CreateSchema to check the archive members
nilchia Apr 11, 2024
1772a31
changed test of DownloadSchema to check the archive members
nilchia Apr 11, 2024
4be4711
changed format=tsv to =tabular in CreateSchema
nilchia Apr 11, 2024
a0e8b31
changed format=tsv to =tabular in JoinProfile
nilchia Apr 11, 2024
0f0cc2d
compare diff in JoinProfile
nilchia Apr 11, 2024
c50ca26
compare diff in AlleleCallEvaluator
nilchia Apr 11, 2024
3a94dc7
compare diff and assert_contents in NSSTATS
nilchia Apr 11, 2024
72259d6
test checks archive members in PrepExternalSchema
nilchia Apr 11, 2024
52195a5
sim_size to diff in AlleleCall
nilchia Apr 11, 2024
9224590
deleted unnecessary test data
nilchia Apr 11, 2024
2a8c54c
update test check in NSSTAT
nilchia Apr 11, 2024
969e98b
update test data
nilchia Apr 11, 2024
bfd3a5d
updated test for NSSTATS. line_diff
nilchia Apr 12, 2024
dcda29d
Update line_diff
nilchia Apr 12, 2024
e306bfb
update line_diff
nilchia Apr 12, 2024
47ed2ac
update on NSSTATS test
nilchia Apr 12, 2024
3810705
Merge branch 'galaxyproject:main' into chewbbaca
nilchia Apr 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions tools/chewbbaca/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
categories:
- Variant Analysis
description: BSR-Based Allele Calling Algorithm
long_description: chewBBACA is a comprehensive pipeline including a set of functions for the creation and validation of whole genome and core genome MultiLocus Sequence Typing (wg/cgMLST) schemas, providing an allele calling algorithm based on Blast Score Ratio that can be run in multiprocessor settings and a set of functions to visualize and validate allele variation in the loci. chewBBACA performs the schema creation and allele calls on complete or draft genomes.
homepage_url: https://github.com/B-UMMI/chewBBACA/tree/master
name: chewbbaca
owner: iuc
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/chewbbaca
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for {{ tool_name }}."
suite:
name: "suite_chewbbaca"
description: "A suite of Galaxy tools designed to work with the chewbbaca-tools collection."
type: repository_suite_definition
178 changes: 178 additions & 0 deletions tools/chewbbaca/AlleleCall.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
<tool id="chewbbaca_allelecall" name="ChewBBACA AlleleCall" version="@CHEW_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>Determine the allelic profiles of a set of genomes</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
mkdir 'input' &&
mkdir 'schema' &&
#for $file in $input_file
ln -sf '$file' 'input/${file.element_identifier}' &&
#end for
unzip '$input_schema' -d 'schema' &&
chewBBACA.py AlleleCall
#if $training_file:
--ptf '$training_file'
#end if
$cds_input
#if $genes_list:
--gl '$genes_list'
#end if
#if str($blast_score_ratio) != ""
--bsr $blast_score_ratio
#end if
#if str($minimum_length) != ""
--l $minimum_length
#end if
#if str($translation_table) != ""
--t $translation_table
#end if
#if str($size_threshold) != ""
--st $size_threshold
#end if
$no_inferred
--pm $prodigal_mode
--mode $mode
--force-continue
#if 'output_unclassified' in $output_selector:
--output-unclassified
#end if
#if 'output_missing' in $output_selector:
--output-missing
#end if
#if 'output_novel' in $output_selector:
--output-novel
#end if
#if 'hash_profile' in $output_selector:
## It can use any hashing algorithm from hashlib but for simplicity we set it to md5
--hash-profile md5
#end if
-i 'input' -g 'schema/schema_seed/' -o 'output'
]]></command>
<inputs>
<param format="fasta" name="input_file" type="data" multiple="true" label="Genome assemblies in FASTA format"/>
<param format="zip" name="input_schema" type="data" label="Schema Files in zip format" help="The schema directory contains the loci FASTA files and a folder named 'short' that contains the FASTA files with the loci representative alleles."/>
<section name="advanced" title="Advanced options">
<param argument="--genes-list" type="data" format="txt" label="Gene list" optional="true" />
<param argument="--training-file" type="data" format="binary" label="Prodigal training file" optional="true" help="By default, gets the training file from the schema"/>
<param argument="--cds-input" type="boolean" truevalue="--cds-input" falsevalue="" checked="false" label="CDS input" optional="true"/>
<param argument="--blast-score-ratio" type="float" min="0.0" max="1.0" value="" optional="true" label="BLAST Score Ratio value" />
<param argument="--minimum-length" type="integer" min="0" value="" optional="true" label="Minimum sequence length value"/>
<param argument="--translation-table" type="integer" min="0" value="" optional="true" help="Must match the genetic code used to create the training file (default: uses value defined in schema config)." label="Genetic code used to predict genes and to translate coding sequences"/>
<param argument="--size-threshold" type="float" min="0" value="" optional="true" label="CDS size variation threshold"/>
<param argument="--no-inferred" type="boolean" truevalue="--no-inferred" falsevalue="" checked="false" optional="true" label="Add the sequences of inferred alleles (INF) to the schema" help="Use this parameter if the schema is being accessed by multiple processes/users simultaneously." />
<param argument="--prodigal-mode" type="select" optional="true" label="Prodigal Mode" help="&quot;single&quot; for finished genomes, reasonable quality draft genomes and big viruses. &quot;meta&quot; for metagenomes, low quality draft genomes, small viruses, and small plasmids">
<option value="single" selected="true">
single
</option>
<option value="meta">
meta
</option>
</param>
<param argument="--mode" type="select" label="Execution mode" optional="true">
<option value="1">Only exact matches at DNA level</option>
<option value="2">Exact matches at DNA and Protein level </option>
<option value="3">Exact matches and minimizer-based clustering to find similar alleles based on BSR+0.1 </option>
<option value="4" selected="true">Exact matches and minimizer-based clustering to find similar alleles based on BSR+0.1 </option>
</param>
</section>
bgruening marked this conversation as resolved.
Show resolved Hide resolved
<section name="output" title="Output Options">
<param name="output_selector" type="select" multiple="true" optional="true" display="checkboxes" label="Select / Deselect all">
<option value="output_unclassified">Create a Fasta file with unclassified coding sequences. (--output-unclassified)</option>
<option value="output_missing">Create a Fasta file with coding sequences classified as NIPH, NIPHEM, ASM, ALM, PLOT3, PLOT5 and LOTSC. (--output-missing)</option>
<option value="output_novel">Create Fasta file with the novel alleles inferred during the allele calling. (--output-novel)</option>
<option value="hash_profile">Create TSV file with hashed allelic profiles. (--hash-profile) </option>
</param>
bgruening marked this conversation as resolved.
Show resolved Hide resolved
</section>
</inputs>
<outputs>
<collection name="allelecall_results" type="list" label="${tool.name} on ${on_string}: AlleleCall Results">
<discover_datasets pattern="(?P&lt;name&gt;.+)\.tsv$" format="tabular" directory="output"/>
</collection>
<collection name="allelcall_log" type="list" label="${tool.name} on ${on_string}: AlleleCall Logs">
<discover_datasets pattern="(?P&lt;name&gt;.+)\.txt$" format="txt" directory="output"/>
</collection>
<data name="unclassified_fasta" format="fasta" from_work_dir="output/unclassified_sequences.fasta" label="${tool.name} on ${on_string}: Unclassified fasta">
<filter>output['output_selector'] and 'output_unclassified' in output['output_selector']</filter>
</data>
<data name="missing_fasta" format="fasta" from_work_dir="output/missing_classes.fasta" label="${tool.name} on ${on_string}: Missing fasta">
<filter>output['output_selector'] and 'output_missing' in output['output_selector']</filter>
</data>
<data name="novel_fasta" format="fasta" from_work_dir="output/novel_alleles.fasta" label="${tool.name} on ${on_string}: Novel fasta">
<filter>output['output_selector'] and 'output_novel' in output['output_selector']</filter>
</data>
</outputs>
<tests>
<test expect_num_outputs="4">
<param name="input_file" value="GCA_000007265.1_ASM726v1_genomic.fna"/>
<param name="input_schema" value="GCA_000007265.1_ASM726v1_schema_seed.zip"/>
<param name="output_selector" value="output_unclassified,output_missing,hash_profile" />
<output_collection name="allelecall_results" type="list">
<element name="cds_coordinates" file="cds_coordinates.tsv" compare="diff"/>
<element name="loci_summary_stats" file="loci_summary_stats.tsv" compare="diff"/>
<element name="paralogous_loci" ftype="tabular">
<assert_contents>
<has_text_matching expression="Genome.*Loci.*CDS"/>
</assert_contents>
</element>
<element name="results_alleles" ftype="tabular">
<assert_contents>
<has_text_matching expression="1.*1.*NIPHEM.*1.*1"/>
<has_text_matching expression="GCA_000007265.*1"/>
</assert_contents>
</element>
<element name="results_alleles" file="results_alleles.tsv" compare="diff"/>
<element name="results_alleles_hashed" ftype="tabular">
<assert_contents>
<has_text_matching expression="FILE.*GCA-000007265-protein1.*GCA-000007265-protein10.*GCA-000007265-protein100"/>
<has_text_matching expression="GCA_000007265.*308e7666834338d0530d925b2737f2c6.*4aece26d201d59a90947e3400c7abf3f.*ebea148832aa2ae2704d37ebd5123169"/>
</assert_contents>
</element>
<element name="results_statistics" file="results_statistics.tsv" compare="diff"/>
</output_collection>
<output_collection name="allelcall_log" type="list">
<element name="logging_info" ftype="txt">
<assert_contents>
<has_text_matching expression="Used a BSR of: 0.6"/>
</assert_contents>
</element>
</output_collection>
<output name="unclassified_fasta">
<assert_contents>
<has_text_matching expression="GCA_000007265-protein15"/>
<has_text_matching expression="ATGCACCACCTGTCACTTCTGCTCCGAAGAGAAAGCCTATCTCTAGGCCGGTCAGAAGGATGTCAAGACCTGGTAAGGTTCTTCGCGTTGCTTCGAATTAAACCACATGCTCCACCGCTTGTGCGGGCCCCCGTCAATTCCTTTGAGTTTCAACCTTGCGGTCGTACTCCCCAGGCGGAGTGCTTAATGCGTTAG"/>
</assert_contents>
</output>
<output name="missing_fasta">
<assert_contents>
<has_text_matching expression="1|GCA_000007265|GCA-000007265-protein16&amp;NIPHEM|GCA_000007265-protein16&amp;EXC"/>
</assert_contents>
</output>
</test>
</tests>
<help>
chewBBACA is a software suite for the creation and evaluation of core genome and whole genome MultiLocus Sequence Typing (cg/wgMLST) schemas and results.

In chewBBACA, by default, an allele needs to be a CDS defined by Prodigal_. To ensure reproducibility of the CDS prediction, the same Prodigal training file for each bacterial species should be used and provided as input.

.. class:: infomark

**Important**

Although the use of a training file is optional, it is highly recommended to ensure consistent results.

If the schema files are created by chewBBACA v2, please use the PrepExternalSchema module to convert the schema to a format fully compatible with chewBBACA v3.

By default, the AlleleCall module uses the Prodigal training file included in the schema’s directory and it is not necessary to pass a training file to the --ptf parameter.

.. class:: infomark

**Note**

If a text file that contains a list of full paths to loci FASTA files or loci IDs, one per line, is passed to the --genes-list parameter, the process will only perform allele calling for the loci in that list.

.. _Prodigal: https://github.com/hyattpd/Prodigal
</help>
<expand macro="citations" />
</tool>
100 changes: 100 additions & 0 deletions tools/chewbbaca/AlleleCallEvaluator.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
<tool id="chewbbaca_allelecallevaluator" name="chewBBACA AlleleCallEvaluator" version="@CHEW_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>Build an interactive report for allele calling results evaluation</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
mkdir 'input' &&
mkdir -p 'schema' &&
#for $file in $input_file
ln -sf '$file' 'input/${file.element_identifier}.tsv' &&
#end for
unzip '$input_schema' -d 'schema' &&
chewBBACA.py AlleleCallEvaluator
#if $annotations:
-a '$annotations'
#end if
#if 'light' in $computation:
--light
#end if
#if 'no-pa' in $computation:
--no-pa
#end if
#if 'no-dm' in $computation:
--no-dm
#end if
#if 'no-tree' in $computation:
--no-tree
#end if
#if 'cg-alignment' in $computation:
--cg-alignment
#end if
-i 'input' -g 'schema/schema_seed/' -o '${html_file.files_path}'
&& cp '${html_file.files_path}'/*\.html output.html

]]></command>
<inputs>
<param name="input_file" type="data_collection" collection_type="list" label="AlleleCall Results" format="tabular"/>
<param name="input_schema" format="zip" type="data" label="Schema Files in zip format" help="The schema directory contains the loci FASTA files and a folder named 'short' that contains the FASTA files with the loci representative alleles."/>
<section name="advanced" title="Advanced options">
<param argument="--annotations" type="data" format="tabular" label="Annotation file" help="File created by the UniprotFinder module" optional="true" />
<param name="computation" type="select" multiple="true" optional="true" display="checkboxes" label="Computation method">
<option value="light">Do not compute the presence-absence matrix, the distance matrix, and the Neighbor-Joining tree. --light</option>
<option value="no-pa">Do not compute the presence-absence matrix. --no-pa</option>
<option value="no-dm">Do not compute the distance matrix. --no-dm</option>
<option value="no-tree">Do not compute the Neighbor-Joining tree. --no-tree</option>
<option value="cg-alignment">Compute the MSA of the core genome loci, even if `--no-tree` is provided. --cg-alignment</option>
</param>
</section>
</inputs>
<outputs>
<data format="html" name="html_file" from_work_dir="output.html" label="${tool.name} on ${on_string}: Webpage" />
</outputs>
<tests>
<test expect_exit_code="0">
<param name="input_file">
<collection type="list">
<element name="cds_coordinates" value="cds_coordinates.tsv" ftype="tabular"/>
<element name="loci_summary_stats" value="loci_summary_stats.tsv" ftype="tabular"/>
<element name="results_alleles" value="results_alleles.tsv" ftype="tabular"/>
<element name="results_statistics" value="results_statistics.tsv" ftype="tabular"/>
</collection>
</param>
<param name="input_schema" value="GCA_000007265.1_ASM726v1_schema_seed.zip"/>
<output name="html_file" file="allelecallevaluator_report.html" ftype="html" compare="diff">
</output>
</test>
<test expect_exit_code="0">
<param name="input_file">
<collection type="list">
<element name="cds_coordinates" value="cds_coordinates.tsv" ftype="tabular"/>
<element name="loci_summary_stats" value="loci_summary_stats.tsv" ftype="tabular"/>
<element name="results_alleles" value="results_alleles.tsv" ftype="tabular"/>
<element name="results_statistics" value="results_statistics.tsv" ftype="tabular"/>
</collection>
</param>
<param name="input_schema" value="GCA_000007265.1_ASM726v1_schema_seed.zip"/>
<param name="annotations" value="schema_seed_annotations.tsv"/>
<output name="html_file" file="allelecallevaluator_report_annotate.html" ftype="html">
<assert_contents>
<has_text_matching expression="&quot;annotations&quot;"/>
<has_text_matching expression="&quot;Uniprot_Name&quot;"/>
</assert_contents>
</output>
</test>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a second test, if you have an annotation file

</tests>
<help>
chewBBACA is a software suite for the creation and evaluation of core genome and whole genome MultiLocus Sequence Typing (cg/wgMLST) schemas and results.

The AlleleCallEvaluator module allows users to generate an interactive HTML report to evaluate allele calling results generated by the AlleleCall module. The report provides summary statistics to evaluate results per sample and per locus (with the possibility to provide a TSV file with loci annotations to include on a table). The report includes components to display a heatmap representing the loci presence-absence matrix, a heatmap representing the distance matrix based on allelic differences and a Neighbor-Joining (NJ) tree based on the MSA of the core genome loci.

.. class:: warningmark

**Warning**

The JS bundle is necessary to visualize the HTML report. Do not move or delete this file.

</help>
<expand macro="citations" />
</tool>
Loading