Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ChewBBACA #5899

Merged
merged 100 commits into from
Apr 13, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
0d90128
add ChewBBACA
nilchia Mar 21, 2024
e31b5f9
update macros
nilchia Mar 21, 2024
9c403c7
add compare by sim_size for zip files
nilchia Mar 21, 2024
1c22e19
3rd tool PrepExternalSchema
nilchia Mar 22, 2024
b38e5c7
Change to min value of --l
nilchia Mar 25, 2024
f1b2d85
Add PrepExternalSchema tool from chewbbaca
nilchia Mar 25, 2024
d22ff0c
Some minor adjustment to xmls. create output of allelecall in test-data
nilchia Mar 25, 2024
87386a9
update test-data
nilchia Mar 25, 2024
22e2901
update test-data
nilchia Mar 25, 2024
698b1fd
add prepareexternalschema
nilchia Mar 25, 2024
cbc4c70
removing uploaded wrong files
nilchia Mar 25, 2024
5b94ea1
aading DownloadSchema tool
nilchia Mar 25, 2024
cef81ad
aading DownloadSchema tool
nilchia Mar 25, 2024
7468524
Add AlleleCallEvaluator
nilchia Mar 26, 2024
933f5fd
update alleleCallEvaluator
nilchia Mar 26, 2024
b7dc7c0
Add FastTree requirement to macros
nilchia Mar 26, 2024
67a7f4c
add AlleleCallEvaluator
nilchia Mar 26, 2024
ad1b55a
update AlleleCall.xml to write cds_coordinates as output
nilchia Mar 26, 2024
0256df7
Update AllelCallEvluator
nilchia Mar 26, 2024
0ca96db
update test.data
nilchia Mar 26, 2024
1092358
update test-data
nilchia Mar 26, 2024
0bd6e4f
add ExtractCgMLST.xml
nilchia Mar 27, 2024
58bd441
Merge branch 'galaxyproject:main' into chewbbaca
nilchia Mar 27, 2024
422a8a2
add NSStats.xml
nilchia Mar 27, 2024
9d1dc78
add JoinProfiles.xml
nilchia Mar 27, 2024
d867846
updata test-data
nilchia Mar 27, 2024
7fe39d5
final NSStats.xml
nilchia Mar 27, 2024
d67b1b7
Final NSStats.xml with help
nilchia Mar 27, 2024
2a34a6a
final DownloadSchema.xml
nilchia Mar 27, 2024
c97a0c0
final DownloadSchema with help and a little correction to NSStema.xml
nilchia Mar 27, 2024
15007cd
final CreatSchema.xml with help
nilchia Mar 27, 2024
2d5f915
final PrepExternalSchema.xml with help
nilchia Mar 27, 2024
49804ed
Final AlleleCall.xml with help
nilchia Mar 27, 2024
dec0a57
NSStats.xml reviewed
nilchia Mar 28, 2024
fcfc806
macros updated species_id
nilchia Mar 28, 2024
e5ed80e
revised CreateSchenma
nilchia Apr 2, 2024
d2fab38
revised CreateSchema with another test
nilchia Apr 2, 2024
4a280ba
revised AlleleCall.xml
nilchia Apr 2, 2024
c871034
revised AlleleCallEvaluator.xml
nilchia Apr 2, 2024
85b2a0d
revised DownloadSchema
nilchia Apr 2, 2024
cdfd121
revised ExtractCgMLST.xml
nilchia Apr 2, 2024
91a8efb
updated macros - INPUT as token
nilchia Apr 2, 2024
56a4bcb
Revised PrepExternalSchema.xml
nilchia Apr 2, 2024
44e8331
revised Joinprofile.xml
nilchia Apr 2, 2024
5c66c2c
Added --cds_input
nilchia Apr 2, 2024
08efc33
Added --cds_input
nilchia Apr 2, 2024
a7a4520
Added --cds-input
nilchia Apr 2, 2024
ca1f086
Added --cds-input
nilchia Apr 2, 2024
5b4a0b7
Added --genes-list
nilchia Apr 2, 2024
f39fec1
added --common
nilchia Apr 2, 2024
dd62428
update test-data
nilchia Apr 2, 2024
cf924b0
Merge branch 'galaxyproject:main' into chewbbaca
nilchia Apr 2, 2024
e06d66e
updated test of DownloadSchema.xml
nilchia Apr 3, 2024
b5d53ff
updated test-data
nilchia Apr 3, 2024
559ae10
updated test-data
nilchia Apr 3, 2024
3ff7677
update on test and output of ExtractCgMLST.xml
nilchia Apr 3, 2024
8008177
Update tools/chewbbaca/AlleleCall.xml
nilchia Apr 5, 2024
5f5f261
Update tools/chewbbaca/AlleleCall.xml
nilchia Apr 5, 2024
d7650b9
Update tools/chewbbaca/AlleleCall.xml
nilchia Apr 5, 2024
e1199b1
Update tools/chewbbaca/PrepExternalSchema.xml
nilchia Apr 5, 2024
533811b
Update tools/chewbbaca/AlleleCallEvaluator.xml
nilchia Apr 5, 2024
9c534c8
added test for 3 optional outputs of AllelCall.xml
nilchia Apr 5, 2024
9bb69d2
Update command line of JoinProfiles.xml
nilchia Apr 5, 2024
fd52c6e
corrected id of tools all lowercase with chewbbaca prefix
nilchia Apr 5, 2024
2173b85
corrected if statement for optional file
nilchia Apr 5, 2024
e50e9c6
change some params to multiple select type
nilchia Apr 5, 2024
4718a0d
added '' to all files and directories
nilchia Apr 5, 2024
c6acc9c
added '' to all files and directories
nilchia Apr 5, 2024
01170b2
better name for macros tokens
nilchia Apr 5, 2024
855121f
the result of AlleleCallEvluator is now a collection
nilchia Apr 5, 2024
eabe306
help is edited
nilchia Apr 5, 2024
edce444
the output is now an html file not a collection
nilchia Apr 8, 2024
fcca0f8
corrected output filtering in AlleleCall.xml
nilchia Apr 8, 2024
5dca753
update test-data
nilchia Apr 8, 2024
bcf0d83
--cds-input is boolean now
nilchia Apr 9, 2024
e011a57
added a test for --cds in CreateSchema.xml
nilchia Apr 9, 2024
93c7722
update test data
nilchia Apr 9, 2024
b7a76b9
--cds is boolean for AlleleCall.xml
nilchia Apr 9, 2024
6203ca7
added test for annotation file in AlleleCallEvaluator.xml
nilchia Apr 10, 2024
788e640
updateed help for AlleleCall.xml ExtractCgMLST.xml
nilchia Apr 10, 2024
1c51e77
update the name of test2 output
nilchia Apr 10, 2024
28219f5
update on 2nd test of AlleleCallEvaluator
nilchia Apr 10, 2024
4b45c8f
Update tools/chewbbaca/CreateSchema.xml
nilchia Apr 11, 2024
b669c99
changed test of CreateSchema to check the archive members
nilchia Apr 11, 2024
1772a31
changed test of DownloadSchema to check the archive members
nilchia Apr 11, 2024
4be4711
changed format=tsv to =tabular in CreateSchema
nilchia Apr 11, 2024
a0e8b31
changed format=tsv to =tabular in JoinProfile
nilchia Apr 11, 2024
0f0cc2d
compare diff in JoinProfile
nilchia Apr 11, 2024
c50ca26
compare diff in AlleleCallEvaluator
nilchia Apr 11, 2024
3a94dc7
compare diff and assert_contents in NSSTATS
nilchia Apr 11, 2024
72259d6
test checks archive members in PrepExternalSchema
nilchia Apr 11, 2024
52195a5
sim_size to diff in AlleleCall
nilchia Apr 11, 2024
9224590
deleted unnecessary test data
nilchia Apr 11, 2024
2a8c54c
update test check in NSSTAT
nilchia Apr 11, 2024
969e98b
update test data
nilchia Apr 11, 2024
bfd3a5d
updated test for NSSTATS. line_diff
nilchia Apr 12, 2024
dcda29d
Update line_diff
nilchia Apr 12, 2024
e306bfb
update line_diff
nilchia Apr 12, 2024
47ed2ac
update on NSSTATS test
nilchia Apr 12, 2024
3810705
Merge branch 'galaxyproject:main' into chewbbaca
nilchia Apr 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions tools/chewbbaca/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
categories:
- Variant Analysis
description: BSR-Based Allele Calling Algorithm
long_description: chewBBACA is a comprehensive pipeline including a set of functions for the creation and validation of whole genome and core genome MultiLocus Sequence Typing (wg/cgMLST) schemas, providing an allele calling algorithm based on Blast Score Ratio that can be run in multiprocessor settings and a set of functions to visualize and validate allele variation in the loci. chewBBACA performs the schema creation and allele calls on complete or draft genomes.
homepage_url: https://github.com/B-UMMI/chewBBACA/tree/master
name: chewbbaca
owner: iuc
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/chewbbaca
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for {{ tool_name }}."
suite:
name: "suite_chewbbaca"
description: "A suite of Galaxy tools designed to work with the chewbbaca-tools collection."
type: repository_suite_definition
173 changes: 173 additions & 0 deletions tools/chewbbaca/AlleleCall.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
<tool id="AlleleCall" name="ChewBBACA AlleleCall" version="@CHEW_VERSION@+galaxy0" python_template_version="3.5" profile="21.05">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tool id="AlleleCall" name="ChewBBACA AlleleCall" version="@CHEW_VERSION@+galaxy0" python_template_version="3.5" profile="21.05">
<tool id="AlleleCall" name="ChewBBACA AlleleCall" version="@CHEW_VERSION@+galaxy0" python_template_version="3.5" profile="@PROFILE@">

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change for tools too

<description>Determine the allelic profiles of a set of genomes</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
mkdir './input' &&
mkdir './schema' &&
#for $file in $input_file
cp $file './input/${file.element_identifier}' &&
#end for
unzip $input_schema -d './schema' &&
nilchia marked this conversation as resolved.
Show resolved Hide resolved
chewBBACA.py AlleleCall
#if str($training_file) != 'None'
--ptf $training_file
#end if
--bsr $blast_score_ratio
--l $minimum_length
--t $translation_table
--st $size_threshold
$no_inferred
--pm $prodigal_mode
--mode $mode
--force-continue
-i './input' -g './schema/schema_seed/' -o './output'
]]></command>
<inputs>
<param format="fasta" name="input_file" type="data" multiple="true" label="Genome assemblies in FASTA format"/>
<param format="zip" name="input_schema" type="data" multiple="true" label="Schema Files in zip format" help="The schema directory contains the loci FASTA files and a folder named 'short' that contains the FASTA files with the loci representative alleles."/>
<section name="advanced" title="Advanced options">
<param argument="--training-file" type="data" format="binary" label="Prodigal training file" optional="true" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add help text from the documentation something like "Default is to get training file from the schema"

nilchia marked this conversation as resolved.
Show resolved Hide resolved
<param argument="--blast-score-ratio" type="float" min="0.0" max="1.0" value="0.6" label="BLAST Score Ratio value" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional, please do not set any value

<param argument="--minimum-length" type="integer" min="0" value="201" label="Minimum sequence length value"/>
<param argument="--translation-table" type="integer" min="0" value="11" label="Genetic code used to predict genes and to translate coding sequences"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also an optional one. Can you also please double check if it is an integer type?
Also add help text from the documentation like "Must match the genetic code used to create the training file (default: uses value defined in schema config).".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is an integer.
from CreateSchema Document: (Optional) Genetic code used to predict genes and to translate coding (default: 11).

<param argument="--size-threshold" type="float" min="0" value="0.2" label="CDS size variation threshold"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optional param. Please do not set any default values to optional params rather use optional="true" and value=""

<param argument="--no-inferred" type="boolean" truevalue="--no-inferred" falsevalue="" checked="false" label="Add the sequences of inferred alleles (INF) to the schema" help="Use this parameter if the schema is being accessed by multiple processes/users simultaneously." />
<param argument="--prodigal-mode" type="select" label="Prodigal Mode" help="single for finished genomes, reasonable quality draft genomes and big viruses. meta for metagenomes, low quality draft genomes, small viruses, and small plasmids">
<option value="single" selected="true">
single
</option>
<option value="meta">
meta
</option>
</param>
<param argument="--mode" type="select" label="Execution mode" >
<option value="1">Only exact matches at DNA level</option>
<option value="2">Exact matches at DNA and Protein level </option>
<option value="3">Exact matches and minimizer-based clustering to find similar alleles based on BSR+0.1 </option>
<option value="4" selected="true">Exact matches and minimizer-based clustering to find similar alleles based on BSR+0.1 </option>
</param>
</section>
bgruening marked this conversation as resolved.
Show resolved Hide resolved
</inputs>
<outputs>
<collection name="allelecall_results" type="list" label="${tool.name} on ${on_string}: AlleleCall Results">
<discover_datasets pattern="(?P&lt;name&gt;.+)\.tsv$" format="tabular" directory="./output"/>
</collection>
<collection name="allelcall_log" type="list" label="${tool.name} on ${on_string}: AlleleCall Logs">
<discover_datasets pattern="(?P&lt;name&gt;.+)\.txt$" format="txt" directory="./output"/>
</collection>
</outputs>
<tests>
<test>
<param name="input_file" value="GCA_000007265.1_ASM726v1_genomic.fna"/>
<param name="input_schema" value="GCA_000007265.1_ASM726v1_schema_seed.zip"/>
<output_collection name="allelecall_results" type="list">
<element name="cds_coordinates" file="cds_coordinates.tsv" compare="sim_size"/>
<element name="loci_summary_stats" file="loci_summary_stats.tsv" compare="sim_size"/>
<element name="paralogous_loci" ftype="tabular">
<assert_contents>
<has_text_matching expression="Genome.*Loci.*CDS"/>
</assert_contents>
</element>
<element name="results_alleles" ftype="tabular">
<assert_contents>
<has_text_matching expression="1.*1.*NIPHEM.*1.*1"/>
<has_text_matching expression="GCA_000007265.*1"/>
</assert_contents>
</element>
<element name="results_alleles" file="results_alleles.tsv" compare="sim_size"/>
<element name="results_statistics" file="results_statistics.tsv" compare="sim_size"/>
</output_collection>
<output_collection name="allelcall_log" type="list">
<element name="logging_info" ftype="txt">
<assert_contents>
<has_text_matching expression="Used a BSR of: 0.6"/>
</assert_contents>
</element>
</output_collection>
</test>
</tests>
<help><![CDATA[

chewBBACA version: 3.3.3
Authors: Rafael Mamede, Pedro Cerqueira, Mickael Silva, João Carriço, Mário Ramirez
Github: https://github.com/B-UMMI/chewBBACA
Documentation: https://chewbbaca.readthedocs.io/en/latest/index.html
Contacts: [email protected]

==========================
chewBBACA - AlleleCall
==========================
Performs allele calling to determine the allelic profiles of a set of samples in FASTA format. The
process identifies new alleles, assigns an integer identifier to those alleles and adds them to the
schema.

-i, --input-files [INPUT_FILES] Path to the directory with the genome FASTA files or
to a file that contains a list of full paths to the
FASTA files, one per line. (default: None)

-g, --schema-directory SCHEMA_DIRECTORY Path to the schema directory. The schema directory
contains the loci FASTA files and a folder named
"short" that contains the FASTA files with the loci
representative alleles. (default: None)

-o, --output-directory OUTPUT_DIRECTORY Output directory where the allele calling results
will be stored (will create a subdirectory named
"results_<TIMESTAMP>" if the path passed by the user
already exists). (default: None)

--ptf, --training-file PTF_PATH Path to the Prodigal training file. Default is to
get the training file from the schema's directory
(default: None)

--bsr, --blast-score-ratio BLAST_SCORE_RATIO BLAST Score Ratio value. Sequences with alignments
with a BSR value equal to or greater than this value
will be considered as sequences from the same gene.
(default: None)

--l, --minimum-length MINIMUM_LENGTH Minimum sequence length accepted for a coding
sequence to be included in the schema. (default:
None)

--t, --translation-table TRANSLATION_TABLE Genetic code used to predict genes and to translate
coding sequences. Must match the genetic code used
to create the training file. (default: None)

--st, --size-threshold SIZE_THRESHOLD CDS size variation threshold. At the default value
of 0.2, alleles with size that deviates +-20 percent
from the locus length mode will be classified as
ASM/ALM (default: None)

--pm, --prodigal-mode {single,meta} Prodigal running mode ("single" for finished
genomes, reasonable quality draft genomes and big
viruses. "meta" for metagenomes, low quality draft
genomes, small viruses, and small plasmids).
(default: single)

--no-inferred If provided, the process will not add the sequences
of inferred alleles (INF) to the schema. Allelic
profiles will still include the allele identifiers
attributed to the inferred alleles. Use this
parameter if the schema is being accessed by
multiple processes/users simultaneously. (default:
False)

--mode {1,2,3,4} Execution mode (1: only exact matches at DNA level;
2: exact matches at DNA and Protein level; 3: exact
matches and minimizer-based clustering to find
similar alleles based on BSR+0.1; 4: runs the full
process to find exact matches and similar matches
based on BSR value, including the determination of
new representative alleles to add to the schema).
(default: 4)


It is strongly advised to perform allele calling with the default schema parameters to ensure more
consistent results. Module documentation available at
https://chewbbaca.readthedocs.io/en/latest/user/modules/AlleleCall.html

]]></help>
<expand macro="citations" />
</tool>
93 changes: 93 additions & 0 deletions tools/chewbbaca/AlleleCallEvaluator.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
<tool id="AlleleCallEvaluator" name="chewBBACA AlleleCallEvaluator" version="@CHEW_VERSION@+galaxy0" python_template_version="3.5" profile="21.05">
<description>Build an interactive report for allele calling results evaluation</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
mkdir './input' &&
mkdir './schema' &&
#for $file in $input_file
cp $file './input/${file.element_identifier}' &&
#end for
unzip $input_schema -d './schema' &&
chewBBACA.py AlleleCallEvaluator
#if str($annotations) != 'None'
-a $annotations
#end if
$light
$no_pa
$no_dm
$no_tree
$cg_alignment
-i './input' -g './schema/schema_seed/' -o '${allelecall_report.files_path}' &&
cp '${allelecall_report.files_path}'/allelecall_report.html $allelecall_report
nilchia marked this conversation as resolved.
Show resolved Hide resolved
]]></command>
<inputs>
<param name="input_file" type="data_collection" collection_type="list" label="AlleleCall Results" format="tabular"/>
<param name="input_schema" format="zip" type="data" multiple="true" label="Schema Files in zip format" help="The schema directory contains the loci FASTA files and a folder named 'short' that contains the FASTA files with the loci representative alleles."/>
<section name="advanced" title="Advanced options">
<param argument="--annotations" type="data" format="tabular" label="Annotation file" help="File created by the UniprotFinder module" optional="true" />
<param argument="--light" type="boolean" truevalue="--light" falsevalue="" checked="false" label="Do not compute the presence-absence matrix, the distance matrix and the Neighbor-Joining tree." />
<param argument="--no-pa" type="boolean" truevalue="--no-pa " falsevalue="" checked="false" label="Do not compute the presence-absence matrix" />
<param argument="--no-dm" type="boolean" truevalue="--no-dm " falsevalue="" checked="false" label="Do not compute the distance matrix" />
<param argument="--no-tree" type="boolean" truevalue="--no-tree " falsevalue="" checked="false" label="Do not compute the Neighbor-Joining tree" />
<param argument="--cg-alignment" type="boolean" truevalue="--cg-alignment " falsevalue="" checked="false" label="Compute the MSA of the core genome loci, even if `--no-tree` is provided" />
</section>
</inputs>
<outputs>
<data format="html" name="allelecall_report" label="${tool.name} on ${on_string}: AlleleCall Report"/>
</outputs>
<tests>
<test>
<param name="input_file">
<collection type="list">
<element name="cds_coordinates.tsv" value="cds_coordinates.tsv" ftype="tabular"/>
<element name="loci_summary_stats.tsv" value="loci_summary_stats.tsv" ftype="tabular"/>
<element name="results_alleles.tsv" value="results_alleles.tsv" ftype="tabular"/>
<element name="results_statistics.tsv" value="results_statistics.tsv" ftype="tabular"/>
</collection>
</param>
<param name="input_schema" value="GCA_000007265.1_ASM726v1_schema_seed.zip"/>
<output name="allelecall_report" file="allelecall_report.html" compare="sim_size"/>
</test>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a second test, if you have an annotation file

</tests>
<help><![CDATA[
chewBBACA version: 3.3.3
Authors: Rafael Mamede, Pedro Cerqueira, Mickael Silva, João Carriço, Mário Ramirez
Github: https://github.com/B-UMMI/chewBBACA
Documentation: https://chewbbaca.readthedocs.io/en/latest/index.html
Contacts: [email protected]

===================================
chewBBACA - AlleleCallEvaluator
===================================

-i, --input-files INPUT_FILES Path to the directory that contains the allele calling
results. (default: None)

-g, --schema-directory SCHEMA_DIRECTORY Path to the schema directory. (default: None)

-o, --output-directory OUTPUT_DIRECTORY Path to the output directory where the report HTML files
will be generated. (default: None)

-a, --annotations ANNOTATIONS Path to the TSV file created by the UniprotFinder module.
(default: None)

--light Do not compute the presence-absence matrix, the distance
matrix and the Neighbor-Joining tree. (default: False)

--no-pa Do not compute the presence-absence matrix. (default:
False)

--no-dm Do not compute the distance matrix. (default: False)

--no-tree Do not compute the Neighbor-Joining tree. (default:
False)

--cg-alignment Compute the MSA of the core genome loci, even if --no-
tree is provided. (default: False)

]]></help>
<expand macro="citations" />
</tool>
Loading
Loading