Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test scripts #70

Merged
merged 17 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Run test script

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
run-test:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'

- name: Install cwltool
run: pip install cwltool

- name: Run tests
run: ./test.sh
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,3 +161,4 @@ domains/proteomics/UseCases-Demo/*
# Exclude generated workflow outputs
examples/cwl-workflows/example*/workflow_*_output/
examples/cwl-workflows/example*/benchmarks.json
**/test/output/
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,18 @@

This repository contains the domain and tool descriptions needed to generate and execute workflows in the [Workflomics](https://github.com/Workflomics/workflomics-frontend) interface.

To add new domains or tools to the `Workflomics` environment, CWL files and other files needed to run these tools should be added to this repository. See the [domain annotation guide](https://workflomics.readthedocs.io/en/domain-creation/developer-guide/domain-development.html) for more information.
To add new domains or tools to the `Workflomics` environment, CWL files and other files needed to run these tools should be added to this repository. For a detailed description of the steps that this requires, have a look at the [Workflomics documentation](https://workflomics.readthedocs.io/en/latest/index.html), under 'Domain Expert Guide'

The repository is organized in the following way:

- `domains/`: Contains the domain descriptions. The descriptions are used by the Workflomics environment to generate workflows. Each domain comprises a set of tools (e.g., described in `tools.json` file) and a configuraion file (e.g., `config.json`) that specifies the domain-specific parameters. See the [documentation of the APE engine](https://ape-framework.readthedocs.io/en/latest/docs/specifications/setup.html#configuration-file) to learn more about the configuration file.
- `cwl-tools/`: Contains the CWL CommandLineTool descriptions of the tools used in the workflows (similar to the [bio-cwl-tools](https://github.com/common-workflow-library/bio-cwl-tools) repo). The CWL files are used by the Workflomics environment to execute each step of the workflow. Within the Workflomics ecosystem these workflows are executed using the [Workflomics Benchmarker](https://github.com/Workflomics/workflomics-benchmarker) which utilizes [cwltool](https://github.com/common-workflow-language/cwltool).
- `domains/`: Contains the domain descriptions. The descriptions are used by the Workflomics environment to generate workflows. Each domain comprises a set of tools (e.g., described in `tools.json` file) and a configuraion file (e.g., `config.json`) that specifies the domain-specific parameters. See the [domain annotation guide](https://workflomics.readthedocs.io/en/latest/domain-expert-guide/domain-development.html) to learn more about these files.

- `cwl-tools/`: Contains the CWL CommandLineTool descriptions of the tools used in the workflows (similar to the [bio-cwl-tools](https://github.com/common-workflow-library/bio-cwl-tools) repo). The CWL files are used by the Workflomics environment to execute each step of the workflow. Within the Workflomics ecosystem these workflows are executed using the [Workflomics Benchmarker](https://github.com/Workflomics/workflomics-benchmarker) which utilizes [cwltool](https://github.com/common-workflow-language/cwltool). For more information about adding new tools, see the [adding tools section](https://workflomics.readthedocs.io/en/latest/domain-expert-guide/adding-tools.html) of the documentation.

- `examples/`: Contains example workflows that can be executed using the [Workflomics Benchmarker](https://github.com/Workflomics/workflomics-benchmarker). The workflows were generated by the Workflomics platform are written in the [Common Workflow Language (CWL)](https://www.commonwl.org/).

When using Workflomics web interface, workflows are referencing this repository directly. They are downloaded during the workflow execution, so you don't need to clone this repository for normal usage in the Workflomics environment.

## Testing

To test the CWL annotations, run `test_cwl_annotations.cwl` in the repository root. This script runs the test scripts in the 'test' directory of each tool, testing whether the CWL annotations pass as stand-alone workflow steps. This requires `cwltool` and `docker` to be installed.
42 changes: 42 additions & 0 deletions cwl-tools/Sage-proteomics/Sage-proteomics.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
cwlVersion: v1.0
label: Sage
class: CommandLineTool
baseCommand: ["/bin/bash", "-c"]
arguments:
- valueFrom: >
"sage -o /data/output -f $(inputs.Sage_in_2.path) \
$(inputs.Configuration.path) $(inputs.Sage_in_1.path) && \
/data/sage_TSV_to_mzIdentML.sh /data/output/results.sage.tsv"
shellQuote: false
requirements:
ShellCommandRequirement: {}
DockerRequirement:
dockerPull: ghcr.io/lazear/sage:v0.14.7
dockerOutputDirectory: /data
InitialWorkDirRequirement:
listing:
- class: File
location: sage_TSV_to_mzIdentML.sh
basename: sage_TSV_to_mzIdentML.sh

inputs:
Sage_in_1:
type: File
format: "http://edamontology.org/format_3244" # mzML
Sage_in_2:
type: File
format: "http://edamontology.org/format_1929" # FASTA
Configuration:
type: File
format: "http://edamontology.org/format_3464" # JSON
default:
class: File
format: "http://edamontology.org/format_3464" # JSON
location: https://raw.githubusercontent.com/Workflomics/tools-and-domains/main/cwl-tools/Sage-proteomics/config.json

outputs:
Sage_out:
type: File
format: "http://edamontology.org/format_3247" # mzIdentML
outputBinding:
glob: /data/output/results.sage.mzid
24 changes: 24 additions & 0 deletions cwl-tools/Sage-proteomics/Sage-proteomics.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{"functions": [{
"outputs": [{
"format_1915": ["http://edamontology.org/format_3247"],
"data_0006": ["http://edamontology.org/data_0945"]
}],
"biotoolsID": "Sage-proteomics",
"inputs": [
{
"format_1915": ["http://edamontology.org/format_3244"],
"data_0006": ["http://edamontology.org/data_0943"]
},
{
"format_1915": ["http://edamontology.org/format_1929"],
"data_0006": ["http://edamontology.org/data_2976"]
}
],
"taxonomyOperations": [
"http://edamontology.org/operation_3631",
"http://edamontology.org/operation_3633",
"http://edamontology.org/operation_2428"
],
"label": "Sage",
"id": "Sage-proteomics"
}]}
61 changes: 61 additions & 0 deletions cwl-tools/Sage-proteomics/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
{
"database": {
"bucket_size": 8192,
"enzyme": {
"missed_cleavages": 2,
"min_len": 7,
"max_len": 50,
"cleave_at": "KR",
"restrict": "P"
},
"fragment_min_mz": 150.0,
"fragment_max_mz": 2000.0,
"peptide_min_mass": 500.0,
"peptide_max_mass": 5000.0,
"ion_kinds": [
"b",
"y"
],
"min_ion_index": 2,
"max_variable_mods": 3,
"static_mods": {
"C": 57.0215
},
"variable_mods": {
"M": 15.994
},
"decoy_tag": "rev_",
"generate_decoys": true
},
"quant": {
"lfq": true,
"lfq_settings": {
"peak_scoring": "Hybrid",
"integration": "Sum",
"spectral_angle": 0.6,
"ppm_tolerance": 5.0
}
},
"precursor_tol": {
"ppm": [
-20.0,
20.0
]
},
"fragment_tol": {
"ppm": [
-20.0,
20.0
]
},
"isotope_errors": [
0,
2
],
"deisotope": true,
"min_peaks": 15,
"max_peaks": 150,
"max_fragment_charge": 1,
"min_matched_peaks": 4,
"predict_rt": true
}
96 changes: 96 additions & 0 deletions cwl-tools/Sage-proteomics/sage_TSV_to_mzIdentML.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
#!/bin/bash
#
# This is a quick-and-dirty converter of Sage TSV output to mzIdentML, with a separate Peptide id entry for each PSM, not each unique peptide.
# This may break some third-party software, and should be fixed in future versions. The software version is assumed to be the latest version
# of Sage (currently 0.14.5). There are currently (February 2024) no CV terms for Sage. Sage outputs "hyperscore". Assume that this is X!Tandem
# hyperscore for now.
#

# Check for input file
if [ "$#" -ne 1 ]; then
echo "Usage: $0 results.sage.tsv (or name of Sage results TSV file, if changed)"
exit 1
fi

# get the current date and time (of the conversion to mzIdentML)
creationDate=`date -I'ns'|tr ',' '.'| awk 'sub("00\+.+","Z")'`

echo "creationDate" $creationDate

# Check Sage version (assumes Sage is in the current directory)
if [ $(which sage) ]; then
version=`sage --version | cut -f2 -d ' '`
else
version="UNKNOWN"
fi

echo "Sage version" $version

inputfile=$1
outputfile="${inputfile%.tsv}.mzid"

# Start and sequence collection of the mzIdentML file
awk -v version=$version -v creationDate=$creationDate '
BEGIN {
print("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
printf("<MzIdentML xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" id=\"\" xsi:schemaLocation=\"http://psidev.info/psi/pi/mzIdentML/1.1 http://www.psidev.info/files/mzIdentML1.2.0.xsd\" creationDate=\"%s\" version=\"1.2.0\" xmlns=\"http://psidev.info/psi/pi/mzIdentML/1.2\">", creationDate);
print(" <cvList>");
print(" <cv fullName=\"Proteomics Standards Initiative Mass Spectrometry Vocabularies\" version=\"4.1.99\" uri=\"https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo\" id=\"PSI-MS\" />");
print(" <cv fullName=\"UNIMOD\" uri=\"https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo\" id=\"UNIMOD\" />");
print(" <cv fullName=\"UNIT-ONTOLOGY\" uri=\"http://obo.cvs.sourceforge.net/viewvc/obo/obo/ontology/phenotype/unit.obo\" id=\"UO\" />");
print(" </cvList>");
print(" <AnalysisSoftwareList>");
printf(" <AnalysisSoftware id=\"Sage\" version=\"%s\" uri=\"https://github.com/lazear/sage\">\n", version);
print(" <SoftwareName>");
print(" <cvParam name=\"X!Tandem\" cvRef=\"PSI-MS\" accession=\"MS:1001456\" />"); # Pretend Sage hyperscores are X!Tandem hyperscores
print(" </SoftwareName>");
print(" </AnalysisSoftware>");
print(" </AnalysisSoftwareList>");
print(" <SequenceCollection>");
}
(NR>1) {
printf(" <Peptide id=\"%s_%09d\">\n",$2,NR-1);
printf(" <PeptideSequence>%s</PeptideSequence>\n",$2);
printf(" </Peptide>\n");
printf(" <PeptideEvidence id=\"%s_%09d_%s\" dBSequence_ref=\"DBSeq_%s\" peptide_ref=\"%s_%09d\" />\n", $2, NR-1, $3, $3, $2, NR-1);
printf(" <DBSequence id=\"DBSeq_%s\" searchDatabase_ref=\"SearchDB_1\" accession=\"%s\">\n", $3, $3);
printf(" <Seq>UNKNOWN</Seq>\n");
printf(" <cvParam name=\"protein description\" value=\"\" cvRef=\"PSI-MS\" accession=\"MS:1001088\" />\n");
print("</DBSequence>");
}
END {
print(" </SequenceCollection>");

}
' $inputfile > $outputfile

# Analysis data and end of the mzIdentML file
awk '
BEGIN {
print(" <DataCollection>");
print(" <AnalysisData>");
print(" <SpectrumIdentificationList id=\"SIL_1\">");
}
(NR>1) {
sub("scan=", "", $8)
printf(" <SpectrumIdentificationResult id=\"SIR_%i\" spectrumID=\"index=%i\" spectraData_ref=\"SD_%i\" name=\"%s\">\n", NR-2, NR-2, NR-1, $8);
printf(" <SpectrumIdentificationItem id=\"SII_%i_%i\" chargeState=\"%i\" experimentalMassToCharge=\"%f\" calculatedMassToCharge=\"%f\" peptide_ref=\"%s_%09d\" rank=\"%i\">\n", NR-2, 1, $13, $11, $12, $2, NR-1, $9);
printf(" <PeptideEvidenceRef peptideEvidence_ref=\"%s_%09d_%s\" />\n", $2, NR-1, $3);
printf(" <cvParam name=\"X!Tandem:hyperscore\" value=\"%f\" cvRef=\"PSI-MS\" accession=\"MS:1001331\" />\n", $20); # Pretend to be X!Tandem hyperscore



print(" </SpectrumIdentificationItem>");


print(" </SpectrumIdentificationResult>");
}
END {
print(" </SpectrumIdentificationList>");
print(" </AnalysisData>");
print(" </DataCollection>");
print("</MzIdentML>");
}
' $inputfile >> $outputfile

echo "Conversion completed: $outputfile"
10 changes: 10 additions & 0 deletions cwl-tools/Sage-proteomics/test/debug-in-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

docker run -it --rm \
--entrypoint /bin/bash \
--mount=type=bind,source=/Users/peter/repos/bakeoff/containers/cwl-tools/Sage/test/data,target=/data/ \
ghcr.io/lazear/sage:v0.14.7

# sage -o /data/output -f /data/small.fasta /data/config.json /data/small.mzML

#docker run -it --rm -v ${PWD}:/data sage:latest /app/sage -o /data /data/config.json
8 changes: 8 additions & 0 deletions cwl-tools/Sage-proteomics/test/input.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Sage_in_1:
class: File
format: http://edamontology.org/format_3244
path: https://raw.githubusercontent.com/Workflomics/DemoKit/refs/heads/main/data/inputs/small.mzML
Sage_in_2:
class: File
format: http://edamontology.org/format_1929
path: https://raw.githubusercontent.com/Workflomics/DemoKit/refs/heads/main/data/inputs/small.fasta
3 changes: 3 additions & 0 deletions cwl-tools/Sage-proteomics/test/run-cwl.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

cwltool --outdir output ../Sage-proteomics.cwl ./input.yml
54 changes: 54 additions & 0 deletions test_cwl_annotations.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/bin/bash

# This script runs the test scripts in the 'test' directory of each tool,
# testing whether the CWL annotations pass as stand-alone workflow steps.
# This requires `cwltool` and `docker` to be installed.

# Set the base directory
base_dir="cwl-tools"

# Create arrays and counters for passed, failed tests, and missing scripts
declare -a failed_tests
passed_tests_count=0
failed_tests_count=0
missing_script_count=0

# Iterate over all directories (tool-names) in the base directory
for tool_dir in "$base_dir"/*/; do
# Define the path to the 'test' directory for the current tool
script_dir="${tool_dir}test"

# Check if the 'run-cwl.sh' script exists
if [ -f "$script_dir/run-cwl.sh" ]; then
echo "Testing $script_dir..."

(cd "$script_dir" && bash "./run-cwl.sh")
if [ $? -eq 0 ]; then
((passed_tests_count++)) # Increment passed tests counter
else
failed_tests+=("$script_dir/run-cwl.sh")
((failed_tests_count++)) # Increment failed tests counter
fi

else
echo "❗ Script not found in directory: $script_dir"
((missing_script_count++)) # Increment the missing script counter
fi
done

# Print missing test script count
if [ $missing_script_count -gt 0 ]; then
echo "❗ $missing_script_count tools did not have a test script"
fi

# Print a summary of test results
if [ $failed_tests_count -eq 0 ]; then
echo "✅ $passed_tests_count tests passed successfully"
exit 0
else
echo "The following tests failed:"
for test in "${failed_tests[@]}"; do
echo "🚨 $test"
done
exit 1
fi
Loading