Allow a study to specify a schema (#252)

* Allow a study to specify a schema * Added MAINTAINER.md * Typo/walrus cleanup * centralized substitution, test improvements
smart-on-fhir · Jun 21, 2024 · 67ef484 · 67ef484
1 parent 23f6bc9
commit 67ef484
Show file tree

Hide file tree

Showing 22 changed files with 464 additions and 124 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -37,6 +37,7 @@ jobs:
         run: |
           python -m pytest --cov-report xml --cov=cumulus_library tests
       - name: Generate coverage report
+        if: github.ref != 'refs/heads/main'
         uses: orgoro/[email protected]
         with:
             coverageFile: coverage.xml

diff --git a/MAINTAINER.md b/MAINTAINER.md
@@ -0,0 +1,39 @@
+# Maintainer notes
+
+This document is intended for users contributing/manitaining this repository.
+It is not comprehensive, but aims to capture items relevant to architecture
+that are not covered in another document.
+
+## Intended usage and database schemas
+
+Since these terms are used somewhat haphazardly in different database implementations, 
+we'll quickly define them for the purposes of this document:
+
+- database - a single instance of a database product, local or in the cloud. it can
+contain serveral schemas.
+- schema - a namespaced collection of tables inside a database
+
+Cumulus, as a holistic system, is designed to allow querying against the entire history
+of a medical institution. You do not need to preselect a cohort - that can be done
+by the author of a given study. We generally recommend using this approach, and it
+is the one that we are trying to use in house.
+
+However, for technical and philosophical reasons, users may wish instead to select
+a cohort at their EHR, and upload that data to a specific named schema in their
+database, and work against that. It's important that we remember this use case
+as we roll out new features.
+
+From the perspective of this repository, and studies which run on top of it, it's 
+important to remember these dual use cases - we should never make assumptions 
+about which database schema will be used, and it may change from one run to the next.
+But all data associated with a single schema (source data and Cumulus studies) should
+exist inside that schema.
+
+As of this writing, the sole exception to this is for third party vocabulary systems.
+For these, the CLI will automatically create these in a unique schema, basically
+(but not enforced) as read only tables that can be referenced by other studies
+via cross-database joining. Additional tables should not be created by users in these
+schemas.
+
+A user could elect to use these vocabulary builders and skip the entire rest of the
+Cumulus ecosystem, if they wanted to.