-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit ingestion of VIDRL flat files #161
Comments
Thanks @j23414 for investigating the latest flat files 🙏 Jotting down notes for updating the flat file ingest:
|
Oh, there's a separate file for the reference panel results. Each flat file has a matching |
The |
@joverlee521 I think we originally asked for the reference panel file and Sheena made it for us. Then later Sheena modified her script that produces the flat files to pull in the relevant information from the reference panel file, so we didn't have to parse that reference information separately. Is there anything in the reference panel file that we can't get from the flat file by parsing the unique homologous titers like you mentioned above? We could jump on a huddle tomorrow to chat, if that's helpful. It's been a little while since I looked at these files, too... |
Yeah, looking at the |
Got it. I can't see the latest files any more (curse OneDrive!), but in the last view I had of those files, they included columns for To get those homologous reference values into our standard format we would need to make new records for each unique combination of antigen, passage, and titer with the |
We chatted about this today and decided that we do need to ingest the additional I'll update tdb/vidrl_upload.py to work with the new flat files and test on a couple Excel/flat file pairs to get a diff of the two paths. |
The column map will be more complicated with the need to ingest two slightly different flat files (_flat_file.csv and _reference_panel.csv) as discussed in #161 (comment). I also found myself constantly toggling back and forth between the separate column_map.tsv and the upload script to figure out how the columns are being used, so it makes more sense to just hard-code the column map in the script.
The column map will be more complicated with the need to ingest two slightly different flat files (_flat_file.csv and _reference_panel.csv) as discussed in #161 (comment). I also found myself constantly toggling back and forth between the separate column_map.tsv and the upload script to figure out how the columns are being used, so it makes more sense to just hard-code the column map in the script.
The column map will be more complicated with the need to ingest two slightly different flat files (_flat_file.csv and _reference_panel.csv) as discussed in #161 (comment). I also found myself constantly toggling back and forth between the separate column_map.tsv and the upload script to figure out how the columns are being used, so it makes more sense to just hard-code the column map in the script.
The column map will be more complicated with the need to ingest two slightly different flat files (_flat_file.csv and _reference_panel.csv) as discussed in #161 (comment). I also found myself constantly toggling back and forth between the separate column_map.tsv and the upload script to figure out how the columns are being used, so it makes more sense to just hard-code the column map in the script.
Brought up by @huddlej on Slack that OneDrive includes flat files. Ingesting the flat files should make the rest of #158 easier?
Revisit changes made in #103 and update it to work with the latest version of the flat files.
The text was updated successfully, but these errors were encountered: