-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metaissue:referential integrity checks via runtime API is needed #318
Comments
Let's stay in touch with @cmungall or @pkalita-lbl because there is some enhanced LInkML validation tooling in the works. |
Sounds like this ticket and #401 are the top two priorities coming out of Thursday's db sync. My concerns are:
@eecavanna raised the possibility of organizing a validation & referential integrity squad. Would that be the best way forward? |
From Slack:
@dwinston to draft a PR to add stronger |
Validation and referential integrity design doc: https://excalidraw.com/#json=3-hD1MOoBuabBr9eOJjOU,IrBWUbkD3ZN6Eksj7F3zHA And roadmap: https://docs.google.com/spreadsheets/d/10UreFu0tnXxjNBaZA0pXvjQ9ZpLhQwp5AhogEnrLVTU/edit?usp=sharing |
@eecavanna @dwinston Notebooks have been added in #521 |
These are the relevant entrypoints that need referential integrity added: Sprint 1
Sprint 2
Sprint 3
|
@dwinston what is the num of seconds |
@aclum this is our current working timeline, with Sprint 1 being now through the team retreat, and Sprint 2 being the two weeks following the retreat |
I would like confirmation that there are no more false positives with this implementation, especially b/c there has been drift from what was done in the notebooks, before this check is added to existing endpoints. Would it be possible to run the code that will be implemented on the existing records in mongo? The current expected results is that 33 records mention an id which does not exist. |
As part of the re-iding squad we discovered over 7,000 documents in mongo that had issues with referential integrity, that is an id was reference in has_input, has_output, was_informed_by, part_of that did not exist. We've had issues with records not being completely deleted, we discovered some bioscales reads based analysis records that were not deleted as part of the re-iding done for GSP and with study identifiers being referenced in biosample records without the study_set record existing yet. Based on that, we need referential integrity checks on all action types (insertion, deletion, update).
cc @shreddd @mbthornton-lbl @dwinston @eecavanna
The text was updated successfully, but these errors were encountered: