Skip to content

Commit

Permalink
Update seqtk_mergefa.xml
Browse files Browse the repository at this point in the history
"Tool merges FASTA/Q files into a FASTA output and considers the quality threshold for FASTQ files when merging."

1. Clarified the -m option to handle ambiguous bases and conflicts (e.g., N and other IUPAC codes).
2. Improved help documentation with clearer examples and explanations.
3. Refined input parameter labels for better clarity and consistency.
  • Loading branch information
dianichj authored Sep 26, 2024
1 parent 63c2240 commit bdd1286
Showing 1 changed file with 16 additions and 11 deletions.
27 changes: 16 additions & 11 deletions tools/seqtk/seqtk_mergefa.xml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0"?>
<tool id="seqtk_mergefa" name="seqtk_mergefa" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="22.05">
<description>Merge two FASTA files</description>
<description>Merge two FASTA/Q files into a FASTA file output</description>
<macros>
<import>macros.xml</import>
</macros>
Expand All @@ -16,14 +16,14 @@ $r
$h
'$in_fa1'
'$in_fa2'
#echo "| pigz -p ${GALAXY_SLOTS:-1} --no-name --no-time" if $in_fa1.is_of_type('fasta.gz') else "" # > '$default'
echo "| pigz -p ${GALAXY_SLOTS:-1} --no-name --no-time" if $in_fa1.is_of_type('fasta.gz', 'fastq.gz') else "" # > '$default'
]]></command>
<inputs>
<param name="in_fa1" type="data" format="fasta,fasta.gz" label="Input FASTA file #1"/>
<param name="in_fa2" type="data" format="fasta,fasta.gz" label="Input FASTA file #2"/>
<param argument="-q" type="integer" value="0" label="Quality threshold"/>
<param name="in_fa1" type="data" format="fasta,fastq,fasta.gz,fastq.gz" label="Input FASTA/Q file #1"/>
<param name="in_fa2" type="data" format="fasta,fastq,fasta.gz,fastq.gz" label="Input FASTA/Q file #2"/>
<param argument="-q" type="integer" value="0" label="Quality threshold (for FASTQ)"/>
<param argument="-i" type="boolean" truevalue="-i" falsevalue="" checked="false" label="Take intersection" />
<param argument="-m" type="boolean" truevalue="-m" falsevalue="" checked="false" label="Convert to lowercase when one of the input bases is N" />
<param argument="-m" type="boolean" truevalue="-m" falsevalue="" checked="false" label="Convert to lowercase for ambiguous bases or conflicts (e.g., N)" />
<param argument="-r" type="boolean" truevalue="-r" falsevalue="" checked="false" label="Pick a random allele from het" />
<param argument="-h" type="boolean" truevalue="-h" falsevalue="" checked="false" label="Suppress hets in the input" />
</inputs>
Expand Down Expand Up @@ -52,9 +52,11 @@ $h
<help><![CDATA[
**What it does**
Merges two FASTA files using ambiguity codes.
This tool merges two FASTA or FASTQ files into a single FASTA file using IUPAC ambiguity codes where appropriate.
When differences occur between the sequences, ambiguity codes are used to represent possible variations.
Additionally, if the `-m` option is set, the tool highlights conflicts by converting nucleotides to lowercase when one of the sequences contains an ambiguity code (e.g., `N`).
::
### Example:
# seq1.fa
>test0
Expand All @@ -64,13 +66,16 @@ Merges two FASTA files using ambiguity codes.
>test0
ACTGAMTGCGN
In the following the `-m` option has been set to highlight seqtk-mergefa's features.
::
With the `-m` option enabled, the tool merges the sequences and handles ambiguities or conflicts as follows:
>test0
ACTGACTGxxa
Explanation:
- Positions with exact matches remain unchanged.
- Positions where no IUPAC code can represent the conflict are marked with placeholders (e.g., `x`).
- If one sequence contains an ambiguous base (e.g., `N`), the corresponding nucleotide in the other sequence is converted to lowercase to indicate uncertainty.
@ATTRIBUTION@
]]></help>
<expand macro="citation" />
Expand Down

0 comments on commit bdd1286

Please sign in to comment.