-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added archival mode #183
Added archival mode #183
Conversation
elif args["action"] == "export": | ||
if args["archive"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had a thought that maybe this should be a toplevel action instead of a flag on export
but after thinking about it more, decided you had the right of it. Just talking out loud - in case you had thoughts on that too. But I think this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i started off that way too, but in digging in, since it's 90% the same, it was A) faster to get it in here, B) a little DRYer, and C) i kind of like keeping the number of base level cli verbs low.
u.codeable_concept.system = 'http://hl7.org/fhir/sid/icd-10-cm' | ||
u.codeable_concept.system LIKE 'http://hl7.org/fhir/sid/icd-10-cm' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm lightly curious about the performance cost of this. Correctness is more important obvi, but if this is noticeably slower, I imagine the jinja could switch to like if it detects a wildcard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a half baked idea in my head of moving the system extraction logic into here in some way, or in some other way doing db introspection to help build these queries out in a more nuanced fashion.
but - only doing LIKE when required seems like low hanging fruit in the interim.
cumulus_library/study_parser.py
Outdated
dataframe.to_parquet(f"{path}/{table}.parquet", index=False) | ||
queries.append(query) | ||
if not archive: | ||
dataframe.to_parquet(f"{path}/{table}.parquet", index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not parquet too in this case? Just space reasons?
This method is starting to feel a little awkward. You start with forked logic (which tables) and end with forked logic (how to handle the results). Only the middle is shared. Should the middle be moved to a helper method and you have two different methods here, like:
if archive:
do_archive()
else:
do_export()
def do_archive():
get tables
do_inner_export()
zip results
def do_export():
get tables
do_inner_export
parquet results
Or maybe instead move some of the "gross" specialized code (like "getting all tables" or "zipping up all tables") into helpers, so that this method can focus just on the if/else-ing.
I dunno. No action needed per se, just started feeling like a lot of very different code based on some if
conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: parquet, my original thought was 'this is for parking data ahead of a paper', since that where this request originated, and since that's all csvs in our current use case, i deferred to that. but there's no reason :not: to parquet, and i guess that enables you to upload an older set of data to the aggregator, so maybe it's worth doing - and that would decrease some of the branching logic. so lemme do that and then i'll see how that feels w.r.t. breaking things out.
I have a nagging doubt about some of the infrastructure in this file - specifically, since its starting to get large, so I think that biases me towards trying to get everything in one place rather than making more functions.
This might be a bad idea. I feel like at :some: point this might need to get broken out into a set of functions per arg inheriting from a base, but i'm reluctant to go that far at the moment, which might be driving some other less sustainable architecture choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved zipping out to utils - that and the parquet cleanup help.
* Added archival mode * test tweaks * PR feedback * moved zip to utils
* Added archival mode * test tweaks * PR feedback * moved zip to utils
This PR makes the following changes:
--archive
flag to the export mode, with a warning about how this is potentially sensitive:Checklist
docs/
) needs to be updated - They do, will do in one go