Added archival mode #183

dogversioning · 2024-02-09T16:21:54Z

This PR makes the following changes:

Adds an --archive flag to the export mode, with a warning about how this is potentially sensitive:
Removes the create argument per the docs being a 'good enough' source Tech Debt: Study creation cleanup #122
Removes a nested field from encounter we only caught because dictionaries are complained about when sorting dataframes
Regenned reference SQL after the yesterday's commits
- tech debt: reference sql validation #182 to remember to automate checks for this

Checklist

Consider if documentation (like in docs/) needs to be updated - They do, will do in one go
Consider if tests should be added
Update template repo if there are changes to study configuration

mikix · 2024-02-09T20:37:17Z

cumulus_library/cli.py

            elif args["action"] == "export":
+                if args["archive"]:


I initially had a thought that maybe this should be a toplevel action instead of a flag on export but after thinking about it more, decided you had the right of it. Just talking out loud - in case you had thoughts on that too. But I think this makes sense.

i started off that way too, but in digging in, since it's 90% the same, it was A) faster to get it in here, B) a little DRYer, and C) i kind of like keeping the number of base level cli verbs low.

mikix · 2024-02-09T20:40:06Z

cumulus_library/studies/core/reference_sql/builder_condition.sql

-            u.codeable_concept.system = 'http://hl7.org/fhir/sid/icd-10-cm'
+            u.codeable_concept.system LIKE 'http://hl7.org/fhir/sid/icd-10-cm'


I'm lightly curious about the performance cost of this. Correctness is more important obvi, but if this is noticeably slower, I imagine the jinja could switch to like if it detects a wildcard.

there is a half baked idea in my head of moving the system extraction logic into here in some way, or in some other way doing db introspection to help build these queries out in a more nuanced fashion.

but - only doing LIKE when required seems like low hanging fruit in the interim.

cumulus_library/study_parser.py

mikix · 2024-02-09T20:42:54Z

cumulus_library/study_parser.py

-            dataframe.to_parquet(f"{path}/{table}.parquet", index=False)
-            queries.append(query)
+            if not archive:
+                dataframe.to_parquet(f"{path}/{table}.parquet", index=False)


Why not parquet too in this case? Just space reasons?

This method is starting to feel a little awkward. You start with forked logic (which tables) and end with forked logic (how to handle the results). Only the middle is shared. Should the middle be moved to a helper method and you have two different methods here, like:

if archive: do_archive() else: do_export() def do_archive(): get tables do_inner_export() zip results def do_export(): get tables do_inner_export parquet results

Or maybe instead move some of the "gross" specialized code (like "getting all tables" or "zipping up all tables") into helpers, so that this method can focus just on the if/else-ing.

I dunno. No action needed per se, just started feeling like a lot of very different code based on some if conditions.

re: parquet, my original thought was 'this is for parking data ahead of a paper', since that where this request originated, and since that's all csvs in our current use case, i deferred to that. but there's no reason :not: to parquet, and i guess that enables you to upload an older set of data to the aggregator, so maybe it's worth doing - and that would decrease some of the branching logic. so lemme do that and then i'll see how that feels w.r.t. breaking things out.

I have a nagging doubt about some of the infrastructure in this file - specifically, since its starting to get large, so I think that biases me towards trying to get everything in one place rather than making more functions.

This might be a bad idea. I feel like at :some: point this might need to get broken out into a set of functions per arg inheriting from a base, but i'm reluctant to go that far at the moment, which might be driving some other less sustainable architecture choices.

moved zipping out to utils - that and the parquet cleanup help.

* Added archival mode * test tweaks * PR feedback * moved zip to utils

dogversioning added 2 commits February 9, 2024 11:14

Added archival mode

bc25632

test tweaks

abd2ba5

mikix approved these changes Feb 9, 2024

View reviewed changes

dogversioning added 2 commits February 13, 2024 09:01

PR feedback

f586e0a

moved zip to utils

df7335d

dogversioning merged commit a36f705 into main Feb 13, 2024
3 checks passed

dogversioning deleted the mg/export-archive branch February 13, 2024 14:41

dogversioning added a commit that referenced this pull request Feb 27, 2024

Added archival mode (#183)

91a5436

* Added archival mode * test tweaks * PR feedback * moved zip to utils

dogversioning added a commit that referenced this pull request Feb 27, 2024

Added archival mode (#183)

f9511bd

* Added archival mode * test tweaks * PR feedback * moved zip to utils

dogversioning mentioned this pull request Feb 27, 2024

Add 'archive' cli option #151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added archival mode #183

Added archival mode #183

dogversioning commented Feb 9, 2024

mikix Feb 9, 2024

dogversioning Feb 9, 2024 •

edited

Loading

mikix Feb 9, 2024

dogversioning Feb 9, 2024

mikix Feb 9, 2024

dogversioning Feb 9, 2024

dogversioning Feb 13, 2024

		u.codeable_concept.system = 'http://hl7.org/fhir/sid/icd-10-cm'
		u.codeable_concept.system LIKE 'http://hl7.org/fhir/sid/icd-10-cm'

Added archival mode #183

Added archival mode #183

Conversation

dogversioning commented Feb 9, 2024

Checklist

mikix Feb 9, 2024

Choose a reason for hiding this comment

dogversioning Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

mikix Feb 9, 2024

Choose a reason for hiding this comment

dogversioning Feb 9, 2024

Choose a reason for hiding this comment

mikix Feb 9, 2024

Choose a reason for hiding this comment

dogversioning Feb 9, 2024

Choose a reason for hiding this comment

dogversioning Feb 13, 2024

Choose a reason for hiding this comment

dogversioning Feb 9, 2024 •

edited

Loading