Skip to content

Commit

Permalink
Replace join metadata and clades script with csvtk and tsv append
Browse files Browse the repository at this point in the history
As part of centralizing ingest scripts, replace the join-metadata-and-clades.py
script with csvtk and tsv append when there aren't any customized calculations.

nextstrain/ingest#23
  • Loading branch information
j23414 committed Oct 4, 2023
1 parent 1f169d2 commit 3b2a1ec
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 83 deletions.
77 changes: 0 additions & 77 deletions ingest/bin/join-metadata-and-clades.py

This file was deleted.

17 changes: 17 additions & 0 deletions ingest/source-data/nextclade-field-map.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
key value
index index
seqName seqName
clade clade
outbreak outbreak
lineage lineage
coverage coverage
totalMissing missing_data
totalSubstitutions divergence
totalNonACGTNs nonACGTN
qc.missingData.status QC_missing_data
qc.mixedSites.status QC_mixed_sites
qc.privateMutations.status QC_rare_mutations
qc.frameShifts.status QC_frame_shifts
qc.stopCodons.status QC_stop_codons
frameShifts frame_shifts
isReverseComplement is_reverse_complement
28 changes: 22 additions & 6 deletions ingest/workflow/snakemake_rules/nextclade.smk
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,31 @@ rule join_metadata_clades:
input:
nextclade="data/nextclade.tsv",
metadata="data/metadata_raw.tsv",
nextclade_field_map="source-data/nextclade-field-map.tsv",
output:
"data/metadata.tsv",
metadata="data/metadata.tsv",
params:
id_field=config["transform"]["id_field"],
shell:
"""
python3 bin/join-metadata-and-clades.py \
--id-field {params.id_field} \
--metadata {input.metadata} \
--nextclade {input.nextclade} \
-o {output}
csvtk -tl rename2 \
-F \
-f '*' \
-p '(.+)' \
-r '{{kv}}' \
-k {input.nextclade_field_map} \
{input.nextclade} \
> results/nextclade_renamed.tsv
export APPEND_FIELDS=`awk 'NR>1 {{print $2}}' {input.nextclade_field_map} | grep -v -e "index" -e "seqName" | tr '\n' ',' | sed 's/,\$//g'`
tsv-join -H \
--filter-file results/nextclade_renamed.tsv \
--key-fields seqName \
--data-fields {params.id_field} \
--append-fields $APPEND_FIELDS \
--allow-duplicate-keys \
--write-all ? \
{input.metadata} \
> {output.metadata}
"""

0 comments on commit 3b2a1ec

Please sign in to comment.