Skip to content

Commit

Permalink
Replace cat with seqkit rmdup
Browse files Browse the repository at this point in the history
This better reflects the duplicate handling used by augur merge. In case
of duplicates, augur merge will keep the last one from the input list.
seqkit rmdup keeps the first one from the input list so the order is
reversed.

There are some behavior changes, none of which should have any impact on
downstream usage:

1. The order of sequences in the file is reversed on a file level.
2. seqkit's default behavior wraps lines at 60 characters.
  • Loading branch information
victorlin committed Dec 7, 2024
1 parent 14b8963 commit b92c80c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion phylogenetic/rules/merge_sequences_usvi.smk
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ rule append_usvi:
metadata = "data/metadata_all.tsv"
shell:
"""
cat {input.sequences} {input.usvi_sequences} > {output.sequences}
seqkit rmdup {input.usvi_sequences} {input.sequences} > {output.sequences}
augur merge \
--metadata ingest={input.metadata} usvi={input.usvi_metadata} \
Expand Down

0 comments on commit b92c80c

Please sign in to comment.