Skip to content

Commit

Permalink
read_flat_vidrl: add column map to script
Browse files Browse the repository at this point in the history
The column map will be more complicated with the need to ingest two
slightly different flat files (_flat_file.csv and _reference_panel.csv)
as discussed in #161 (comment).

I also found myself constantly toggling back and forth between the
separate column_map.tsv and the upload script to figure out how the
columns are being used, so it makes more sense to just hard-code the
column map in the script.
  • Loading branch information
joverlee521 committed Nov 19, 2024
1 parent d632af0 commit d4eb286
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 16 deletions.
6 changes: 0 additions & 6 deletions source-data/vidrl_flat_file_column_map.tsv

This file was deleted.

20 changes: 10 additions & 10 deletions tdb/vidrl_upload.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,15 +54,6 @@
}
}

def parse_tsv_mapping_to_dict(tsv_file):
map_dict = {}
with open(tsv_file, 'r') as f:
for line in f:
(key, value) = line.split('\t')
key = key.lower()
map_dict[key] = value.rstrip('\n')
return map_dict


def parse_human_serum_references(human_serum_data, subtype):
"""
Expand Down Expand Up @@ -320,7 +311,16 @@ def read_flat_vidrl(path, fstem, assay_type):
Read the flat CSV file with *fstem* in the provided *path* and convert
to the expected TSV file at `data/tmp/<fstem>.tsv` for tdb/elife_upload.
"""
column_map = parse_tsv_mapping_to_dict("source-data/vidrl_flat_file_column_map.tsv")
# The new column names need to be one of the ELIFE_COLUMNS in order to be
# included in the temporary output file that's then passed to elife_upload.py
column_map = {
"virus": "virus_strain",
"virus.passage": "virus_passage",
"antisera.passage": "serum_passage",
"ferret": "serum_id",
"value": "titer",
"antisera.name": "serum_strain"
}
filepath = path + fstem + ".csv"

titer_measurements = pd.read_csv(filepath, usecols=column_map.keys()) \
Expand Down

0 comments on commit d4eb286

Please sign in to comment.