Skip to content

Commit

Permalink
Allow a study to specify a schema (#252)
Browse files Browse the repository at this point in the history
* Allow a study to specify a schema

* Added MAINTAINER.md

* Typo/walrus cleanup

* centralized substitution, test improvements
  • Loading branch information
dogversioning authored Jun 21, 2024
1 parent 23f6bc9 commit 67ef484
Show file tree
Hide file tree
Showing 22 changed files with 464 additions and 124 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ jobs:
run: |
python -m pytest --cov-report xml --cov=cumulus_library tests
- name: Generate coverage report
if: github.ref != 'refs/heads/main'
uses: orgoro/[email protected]
with:
coverageFile: coverage.xml
Expand Down
39 changes: 39 additions & 0 deletions MAINTAINER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Maintainer notes

This document is intended for users contributing/manitaining this repository.
It is not comprehensive, but aims to capture items relevant to architecture
that are not covered in another document.

## Intended usage and database schemas

Since these terms are used somewhat haphazardly in different database implementations,
we'll quickly define them for the purposes of this document:

- database - a single instance of a database product, local or in the cloud. it can
contain serveral schemas.
- schema - a namespaced collection of tables inside a database

Cumulus, as a holistic system, is designed to allow querying against the entire history
of a medical institution. You do not need to preselect a cohort - that can be done
by the author of a given study. We generally recommend using this approach, and it
is the one that we are trying to use in house.

However, for technical and philosophical reasons, users may wish instead to select
a cohort at their EHR, and upload that data to a specific named schema in their
database, and work against that. It's important that we remember this use case
as we roll out new features.

From the perspective of this repository, and studies which run on top of it, it's
important to remember these dual use cases - we should never make assumptions
about which database schema will be used, and it may change from one run to the next.
But all data associated with a single schema (source data and Cumulus studies) should
exist inside that schema.

As of this writing, the sole exception to this is for third party vocabulary systems.
For these, the CLI will automatically create these in a unique schema, basically
(but not enforced) as read only tables that can be referenced by other studies
via cross-database joining. Additional tables should not be created by users in these
schemas.

A user could elect to use these vocabulary builders and skip the entire rest of the
Cumulus ecosystem, if they wanted to.
Loading

0 comments on commit 67ef484

Please sign in to comment.