Skip to content

Commit

Permalink
read_flat_vidrl: Clean up virus_strain that includes "pool" suffix
Browse files Browse the repository at this point in the history
Based on meeting with VIDRL, we should only keep homologous titers
for `virus_strain` that includes "pool" suffix. This will act as a proxy
homologous titer for the human serum references. All other virus strains
that include the "pool" suffix are ignored because they are duplicate
data.
  • Loading branch information
joverlee521 committed Nov 6, 2024
1 parent 467cc58 commit 21cbcd8
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions tdb/vidrl_upload.py
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,8 @@ def curate_reference_panel_records(
"antisera passage": "serum_passage",
"ferret": "serum_id",
"titre": "titer",
# Used for cleaning up `virus_strain` that includes "pool" suffix
"homologous": "homologous"
}

for record in records:
Expand All @@ -408,8 +410,16 @@ def curate_reference_panel_records(

new_record = standardize_human_serum(new_record, "virus_passage")

# TODO: Clean up `virus_strain` that includes "pool" suffix
# Should these be dropped completely because they are not "real" measurements?
# Clean up `virus_strain` that includes "pool" suffix
# Strip "pool" suffix and keep as proxy of homologous titer
# for the human serum pool reference. Skip measurements that are not
# marked as homologous since they are just duplicates of the proxy measurements
# -Jover, 04 November 2024
if re.match(r".*pool$", new_record["virus_strain"]):
if new_record["homologous"] == "TRUE":
new_record["virus_strain"] = re.sub(r"pool$", "", new_record["virus_strain"])
else:
continue

yield new_record

Expand Down

0 comments on commit 21cbcd8

Please sign in to comment.