Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add beacon2 tools (vcf2bff.pl, pxf2bff, csv2xlsx and bff-validator) into galaxy #5442

Merged
merged 38 commits into from
Oct 1, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
67006eb
Add files via upload
khaled196 Aug 11, 2023
87b7cfe
Create .shed.yml
khaled196 Aug 11, 2023
9ebbb48
remove input from macros
khaled196 Aug 11, 2023
4a7a0e3
Delete defaultSchema.json
khaled196 Aug 11, 2023
a995a34
Update vcf2bff.xml
khaled196 Aug 11, 2023
17f507e
Update vcf2bff.xml
khaled196 Aug 11, 2023
9a80802
lets try the current working dir
bgruening Aug 12, 2023
6698293
Merge branch 'galaxyproject:main' into beacon2
khaled196 Aug 14, 2023
716f10e
add more tools/ fix test data
khaled196 Aug 17, 2023
a362403
fix missing files
khaled196 Aug 17, 2023
e0db592
change the order of the input files, the file type of the schema_dir …
khaled196 Sep 6, 2023
7fee9d1
add documentation to the tools arguments and help part
khaled196 Sep 7, 2023
711f905
add documentation to the tools arguments and help part
khaled196 Sep 7, 2023
6eb6db5
remove bff_validator
khaled196 Sep 15, 2023
c6b0b06
remove bff_validator
khaled196 Sep 15, 2023
385eedc
Update csv2xlsx.xml
bgruening Sep 16, 2023
68c9293
Update pxf2bff.xml
bgruening Sep 16, 2023
8095cc5
Update vcf2bff.xml
bgruening Sep 16, 2023
319ffbf
change ftype to VCF_bgzip
khaled196 Sep 16, 2023
f81b242
change ftype to VCF_bgzip
khaled196 Sep 16, 2023
6fc72b8
fix the gzip input file into bgzf format
khaled196 Sep 16, 2023
b44588d
fix test file size
khaled196 Sep 16, 2023
e6f8885
sort compressed json
khaled196 Sep 16, 2023
1e8af22
uncompredd genomicvariation file
khaled196 Sep 16, 2023
5467161
fix file size
khaled196 Sep 16, 2023
8401482
fix json file compression
khaled196 Sep 18, 2023
6594a32
gwnweral fix
khaled196 Sep 18, 2023
ab5222f
general fix
khaled196 Sep 18, 2023
a43e375
general fix
khaled196 Sep 18, 2023
9ddfc43
change ftype from vcf_bgzip to tabular.gz
khaled196 Sep 19, 2023
a97ddac
alternative option decompress the gz file change the type into vcf an…
khaled196 Sep 19, 2023
8644695
fix
khaled196 Sep 19, 2023
7a5eba4
fix file size
khaled196 Sep 19, 2023
b380110
return to tabular.gz
khaled196 Sep 19, 2023
db40bca
change tool ownership
khaled196 Sep 28, 2023
5358bac
return ownership to ius
khaled196 Sep 28, 2023
027149d
RETURN OWNDER TO iuc and add creators
khaled196 Sep 29, 2023
fcbf219
Merge branch 'main' into beacon2
bgruening Sep 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions tools/beacon2/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
categories:
- Variant Analysis
description: beacon2-ri-tools are part of the ELIXIR-CRG Beacon v2 Reference Implementation (B2RI).
long_description: beacon2-ri-tools are a collection of Perl programmes with the aim of
transforming genomic variations data (VCF) to queryable data (MongoDB).
homepage_url: https://github.com/EGA-archive/beacon2-ri-tools/tree/main
name: beacon2
owner: iuc
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/beacon2
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for {{ tool_name }}."
suite:
name: "suite_beacon2"
description: "A suite of Galaxy tools designed to work with the beacon2-ri-tools collection."
type: repository_suite_definition
46 changes: 46 additions & 0 deletions tools/beacon2/csv2xlsx.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<tool id="beacon2_csv2xlsx" name="Beacon" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="21.05">
<description>v2 CSV Models to XLSX</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="xrefs"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
#import re
#set $names = []
#set $x=1
#for $x, $csv in enumerate($csvs):
#set $name_base = re.sub('[^\w\-_\.]', '_', $csv.element_identifier)
#set $name = $name_base
#silent $names.append( $name )
ln -s '$csv' ${name} &&
#end for
csv2xlsx
#for $name in $names:
${name}
#end for
-o Beacon-v2-Models_template.xlsx
]]></command>
<inputs>
<param name="csvs" type="data" multiple="true" format="csv" label="CSV files" help="" />
</inputs>
<outputs>
<data name="Beacon_v2_Models_template" format="xlsx" label="${tool.name} on ${on_string}: Beacon-v2-Models_template file" from_work_dir="Beacon-v2-Models_template.xlsx" />
</outputs>
<tests>
<test expect_num_outputs="1">
<param name="csvs" ftype="csv" value="analyses.csv,genomicVariations.csv,runs.csv,datasets.csv,biosamples.csv,individuals.csv,cohorts.csv" />
<output name="Beacon_v2_Models_template" file="Beacon-v2-Models_template.xlsx" compare="sim_size">
<assert_contents><has_size value="12000" delta="1000" /></assert_contents>
</output>
</test>
</tests>
<help><![CDATA[
The Tool converts the data from multiple CSV files to the hierarchical structure of the Beacon v2 Models and creates an Excel file with seven entities.
The Models entities are (analyses, biosamples, cohorts, datasets, genomicVariations, individuals and runs). The Excel consisting of all Models properties ‘flattened-out’
The Excel file is separated into seven sheets (one per entry type). The user is responsible for filling out the Excel according to the entities and terms they want to share.
Once the sheets are filled out, the Beacon v2 Reference Implementation comes with a utility that validates the Excel file against the Models JSON Schemas, and, if successful, it creates a set of JSON text
files (JSON arrays) as an output that will be later loaded into the database.
]]></help>
<expand macro="citations" />
</tool>
19 changes: 19 additions & 0 deletions tools/beacon2/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<macros>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@TOOL_VERSION@">2.0.0</token>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">beacon2-ri-tools</requirement>
</requirements>
</xml>
<xml name="xrefs">
<xrefs>
<xref type="bio.tools">GA4GH Beacon</xref>
</xrefs>
</xml>
<xml name="citations">
<citations>
<citation type="doi">10.1093/bioinformatics/btac568</citation>
</citations>
</xml>
</macros>
41 changes: 41 additions & 0 deletions tools/beacon2/pxf2bff.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
<tool id="beacon2_pxf2bff" name="PXF2BFF" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="21.05">
<description>converts Phenopacket PXF (JSON) to BFF (JSON)</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="xrefs"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
#import re
#set $names = []
#set $x=1
#for $x, $i in enumerate($input):
#set $name_base = re.sub('[^\w\-_\.]', '_', $i.element_identifier)
#set $name = $name_base
#silent $names.append( $name )
ln -s '$i' ${name} &&
#end for
pxf2bff
#for $name in $names:
-i ${name}
#end for
-o ./
]]></command>
<inputs>
<param argument="--input" type="data" multiple="true" format="json" label="Phenopacket JSON files" help="" />
</inputs>
<outputs>
<data name="BFF_JSON_File" format="json" label="${tool.name} on ${on_string}: BFF_JSON_File" from_work_dir="individuals.json" />
</outputs>
<tests>
<test expect_num_outputs="1">
<param name="input" ftype="json" value="EGAF00005572750.json,EGAF00005572753.json,EGAF00005572884.json,EGAF00005572893.json,EGAF00005572727.json,EGAF00005572756.json,EGAF00005572721.json,EGAF00005572902.json,EGAF00005572759.json,EGAF00005572881.json,EGAF00005572896.json,EGAF00005572890.json,EGAF00005572861.json,EGAF00005572899.json,EGAF00005572762.json,EGAF00005572887.json,EGAF00005572724.json,EGAF00005572747.json" />
<output name="BFF_JSON_File" file="individuals.json" />
</test>
</tests>
<help><![CDATA[
The tool combines different Phenopacket JSON files into one JSON file. The Phenopacket Schema represents an open standard for sharing disease and phenotype information to
improve our ability to understand, diagnose, and treat both rare and common diseases. The generated file will be ready to be stored on the MongoDB instance as MongoDB works directly with JSON files.
]]></help>
<expand macro="citations" />
</tool>
Binary file not shown.
106 changes: 106 additions & 0 deletions tools/beacon2/test-data/EGAF00005572721.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"phenopacket": {
"id": "P0007500",
"subject": {
"id": "P0007500",
"dateOfBirth": "unknown-01-01T00:00:00Z",
"sex": "FEMALE"
},
"phenotypicFeatures": [],
"diseases": [],
"genes": [],
"variants": [],
"meta_data": {
"created": "2021-04-21T09:37:19.994Z",
"resources": [
{
"id": "hp",
"name": "Human Phenotype Ontology",
"url": "http://purl.obolibrary.org/obo/hp.owl",
"version": "2020-12-07",
"namespacePrefix": "HP",
"iriPrefix": "http://purl.obolibrary.org/obo/HP_"
},
{
"id": "orphanet",
"name": "Orphanet Rare Disease Ontology",
"url": "http://orpha.net/ontology/ORDO_en_3.1.owl",
"version": "3.1",
"namespacePrefix": "Orphanet",
"iriPrefix": "http://www.orpha.net/ORDO/Orphanet_"
},
{
"id": "hgnc",
"name": "HUGO Gene Nomenclature Committee",
"url": "https://www.genenames.org",
"version": "2021-01-13",
"namespacePrefix": "HGNC",
"iriPrefix": "https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/"
},
{
"id": "mim",
"name": "Online Mendelian Inheritance in Man",
"url": "https://omim.org/",
"version": "2021-01-21",
"namespacePrefix": "OMIM",
"iriPrefix": "https://omim.org/entry/"
}
]
}
},
"interpretation": {
"id": "P0007500",
"resolutionStatus": "UNSOLVED",
"phenopacket": {
"id": "P0007500",
"subject": {
"id": "P0007500",
"dateOfBirth": "unknown-01-01T00:00:00Z",
"sex": "FEMALE"
},
"phenotypicFeatures": [],
"diseases": [],
"genes": [],
"variants": [],
"meta_data": {
"created": "2021-04-21T09:37:19.994Z",
"resources": [
{
"id": "hp",
"name": "Human Phenotype Ontology",
"url": "http://purl.obolibrary.org/obo/hp.owl",
"version": "2020-12-07",
"namespacePrefix": "HP",
"iriPrefix": "http://purl.obolibrary.org/obo/HP_"
},
{
"id": "orphanet",
"name": "Orphanet Rare Disease Ontology",
"url": "http://orpha.net/ontology/ORDO_en_3.1.owl",
"version": "3.1",
"namespacePrefix": "Orphanet",
"iriPrefix": "http://www.orpha.net/ORDO/Orphanet_"
},
{
"id": "hgnc",
"name": "HUGO Gene Nomenclature Committee",
"url": "https://www.genenames.org",
"version": "2021-01-13",
"namespacePrefix": "HGNC",
"iriPrefix": "https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/"
},
{
"id": "mim",
"name": "Online Mendelian Inheritance in Man",
"url": "https://omim.org/",
"version": "2021-01-21",
"namespacePrefix": "OMIM",
"iriPrefix": "https://omim.org/entry/"
}
]
}
},
"diagnosis": [],
"meta_data": {}
}
}
106 changes: 106 additions & 0 deletions tools/beacon2/test-data/EGAF00005572724.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"phenopacket": {
"id": "P0007499",
"subject": {
"id": "P0007499",
"dateOfBirth": "unknown-01-01T00:00:00Z",
"sex": "MALE"
},
"phenotypicFeatures": [],
"diseases": [],
"genes": [],
"variants": [],
"meta_data": {
"created": "2021-04-21T09:33:50.417Z",
"resources": [
{
"id": "hp",
"name": "Human Phenotype Ontology",
"url": "http://purl.obolibrary.org/obo/hp.owl",
"version": "2020-12-07",
"namespacePrefix": "HP",
"iriPrefix": "http://purl.obolibrary.org/obo/HP_"
},
{
"id": "orphanet",
"name": "Orphanet Rare Disease Ontology",
"url": "http://orpha.net/ontology/ORDO_en_3.1.owl",
"version": "3.1",
"namespacePrefix": "Orphanet",
"iriPrefix": "http://www.orpha.net/ORDO/Orphanet_"
},
{
"id": "hgnc",
"name": "HUGO Gene Nomenclature Committee",
"url": "https://www.genenames.org",
"version": "2021-01-13",
"namespacePrefix": "HGNC",
"iriPrefix": "https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/"
},
{
"id": "mim",
"name": "Online Mendelian Inheritance in Man",
"url": "https://omim.org/",
"version": "2021-01-21",
"namespacePrefix": "OMIM",
"iriPrefix": "https://omim.org/entry/"
}
]
}
},
"interpretation": {
"id": "P0007499",
"resolutionStatus": "UNSOLVED",
"phenopacket": {
"id": "P0007499",
"subject": {
"id": "P0007499",
"dateOfBirth": "unknown-01-01T00:00:00Z",
"sex": "MALE"
},
"phenotypicFeatures": [],
"diseases": [],
"genes": [],
"variants": [],
"meta_data": {
"created": "2021-04-21T09:33:50.417Z",
"resources": [
{
"id": "hp",
"name": "Human Phenotype Ontology",
"url": "http://purl.obolibrary.org/obo/hp.owl",
"version": "2020-12-07",
"namespacePrefix": "HP",
"iriPrefix": "http://purl.obolibrary.org/obo/HP_"
},
{
"id": "orphanet",
"name": "Orphanet Rare Disease Ontology",
"url": "http://orpha.net/ontology/ORDO_en_3.1.owl",
"version": "3.1",
"namespacePrefix": "Orphanet",
"iriPrefix": "http://www.orpha.net/ORDO/Orphanet_"
},
{
"id": "hgnc",
"name": "HUGO Gene Nomenclature Committee",
"url": "https://www.genenames.org",
"version": "2021-01-13",
"namespacePrefix": "HGNC",
"iriPrefix": "https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/"
},
{
"id": "mim",
"name": "Online Mendelian Inheritance in Man",
"url": "https://omim.org/",
"version": "2021-01-21",
"namespacePrefix": "OMIM",
"iriPrefix": "https://omim.org/entry/"
}
]
}
},
"diagnosis": [],
"meta_data": {}
}
}
Loading