Updates for i2b2 export support #126

dogversioning · 2023-09-29T15:05:09Z

This PR makes the following changes:

Updates to the clean arg group (really only used by us, and infrequently) to adapt to prefixes in manifests, and cleaning up dangling old tables/user created tables.
Updates supporting i2b2 injestion
- Moved encounter coding output to link to denormalized tables
- Added static table for encounter class display fields
- Fixed namespace class vs display distinction

Checklist

Consider if documentation (like in docs/) needs to be updated
Consider if tests should be added
Run pylint if you're making changes beyond adding studies
Update template repo if there are changes to study configuration

cumulus_library/study_parser.py

mikix · 2023-10-11T19:07:56Z

cumulus_library/cli.py

@@ -89,7 +89,7 @@ def reset_data_path(self, study: PosixPath) -> None:

    ### Creating studies

-    def clean_study(self, targets: List[str]) -> None:
+    def clean_study(self, targets: List[str], explicit_prefix=False) -> None:


Can you expand a bit on what the explicit prefix stuff is all about? It makes me lightly nervous to allow cleaning on any old prefix - just seems very "shoot yourself in the foot" territory.

If it's to clean up for oddball historical tables... Maybe instead of an explicit prefix, we could do explicit full-name matching. I.e. Don't allow cleaning on ^m* but rather just ^mistake$ -- or is that not a real issue / not a good idea?

if we're doing explicit full name matching, we may as well just manually drop the tables in athena directly. if you mean word matching, then maybe?

There are three use cases:

as you say, historical data created by the library without the dunder syntax

User created tables - things like study ideation, etc etc

Removing a study when you don't have the manifest on hand

I'm more comfortable with this being an option here as opposed to the etl letting you do wonky stuff like this, because by definition this is not a gold source, and should be treated as being ephemeral by its nature.

My concern was just avoiding accidental deletion.

Yeah you could manually drop the tables in Athena, but that's more technical - I had assumed this was exactly that, but for casuals who aren't 100% how to do that.

Whether this "full name" idea makes sense depends on how many tables we're talking about in those use cases. Like 10 might be annoying to do individually. But 3? 🤷

It might be nice to have like, a guard against destructive actions, since this can affect tables the user might not be thinking about (like, they forget that covid_symptoms__ exists when they're trying to delete some old covid_junk)

Something like This command will delete these tables: [...] Continue? [y/N]

Doesn't have to be this PR, but it feels like a nice guard, now that we're moving past single study actions.

Regardless, this is fine - I just get real nervous about wildcard deletes with no ability to undo it 😄

(I know that's ironic coming from someone who recently had the NdjsonFormat class uncritically delete a dir. But that did get fixed for a reason.)

i like the list & confirm idea - i'll either squeeze it in now or make a ticket for it.

cumulus_library/cli_parser.py

dogversioning · 2023-10-12T15:29:55Z

cumulus_library/study_parser.py

+            dataframe = dataframe.sort_values(
+                by=list(dataframe.columns), ascending=False, na_position="first"
+            )


After seeing the churn in the reference data files, I threw this in, which gets us back to the original format for exports before we removed the ORDER BY clauses for performance reasons.

dogversioning force-pushed the mg/i2b2_update branch 2 times, most recently from 72bc8eb to 78ed1b2 Compare October 11, 2023 15:38

Updates for i2b2 export support

e84967e

dogversioning force-pushed the mg/i2b2_update branch from 78ed1b2 to e84967e Compare October 11, 2023 18:40

dogversioning marked this pull request as ready for review October 11, 2023 18:41

mikix approved these changes Oct 11, 2023

View reviewed changes

dogversioning added 2 commits October 12, 2023 09:22

reference data update

acfe560

sort on export

3e0bb23

dogversioning commented Oct 12, 2023

View reviewed changes

dogversioning added 2 commits October 12, 2023 13:12

testing cleanup

33c842d

prefix cleanup

2ea1ca3

dogversioning merged commit eae81c1 into main Oct 12, 2023
3 checks passed

dogversioning deleted the mg/i2b2_update branch October 12, 2023 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates for i2b2 export support #126

Updates for i2b2 export support #126

dogversioning commented Sep 29, 2023 •

edited

Loading

mikix Oct 11, 2023

dogversioning Oct 11, 2023

mikix Oct 11, 2023

mikix Oct 11, 2023

dogversioning Oct 12, 2023

dogversioning Oct 12, 2023

dogversioning Oct 12, 2023

Updates for i2b2 export support #126

Updates for i2b2 export support #126

Conversation

dogversioning commented Sep 29, 2023 • edited Loading

Checklist

mikix Oct 11, 2023

Choose a reason for hiding this comment

dogversioning Oct 11, 2023

Choose a reason for hiding this comment

mikix Oct 11, 2023

Choose a reason for hiding this comment

mikix Oct 11, 2023

Choose a reason for hiding this comment

dogversioning Oct 12, 2023

Choose a reason for hiding this comment

dogversioning Oct 12, 2023

Choose a reason for hiding this comment

dogversioning Oct 12, 2023

Choose a reason for hiding this comment

dogversioning commented Sep 29, 2023 •

edited

Loading