Skip to content

Commit

Permalink
fixup replace: more efficient joining
Browse files Browse the repository at this point in the history
This change bypasses the APPEND_FIELDS and the intermediate nextclade_renamed.tsv file.

Co-authored-by: Jover Lee <[email protected]>
  • Loading branch information
j23414 and joverlee521 committed Oct 11, 2023
1 parent eda5000 commit 39504b7
Showing 1 changed file with 4 additions and 7 deletions.
11 changes: 4 additions & 7 deletions ingest/workflow/snakemake_rules/nextclade.smk
Original file line number Diff line number Diff line change
Expand Up @@ -73,16 +73,13 @@ rule join_metadata_clades:
-p '(.+)' \
-r '{{kv}}' \
-k {input.nextclade_field_map} \
> data/nextclade_renamed.tsv
export APPEND_FIELDS=`awk 'NR>1 {{print $2}}' {input.nextclade_field_map} | grep -v "seqName" | tr '\n' ',' | sed 's/,$//g'`
tsv-join -H \
--filter-file data/nextclade_renamed.tsv \
| tsv-join -H \
--filter-file - \
--key-fields seqName \
--data-fields {params.id_field} \
--append-fields $APPEND_FIELDS \
--append-fields '*' \
--write-all ? \
{input.metadata} \
| tsv-select -H --exclude seqName \
> {output.metadata}
"""

0 comments on commit 39504b7

Please sign in to comment.