You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A part of the data curation occurs during vdb/upload and tdb/upload making it difficult to debug data curation issues and hard to share data curation steps with external groups.
Potential solutions
Detangle data curation and data upload within fauna.
Start brand new ingest workflows for curation where the results are then optionally uploaded to fauna.
The text was updated successfully, but these errors were encountered:
Having worked in fauna for the first time in a few years, this decoupling would be much welcome. For the work I was doing in avian flu (no titers!) I'd propose (3):
Use fauna to mirror GISAID (indexing on isolate_id and accession), i.e. fauna contains no curation at all. We then have ingest pipelines which start by downloading from fauna, curate the data, and then either use it directly or upload to S3.
Additional context in Slack
A part of the data curation occurs during vdb/upload and tdb/upload making it difficult to debug data curation issues and hard to share data curation steps with external groups.
Potential solutions
The text was updated successfully, but these errors were encountered: