-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSM CLI, table persistence #160
Conversation
b799f08
to
ff92be6
Compare
@@ -21,15 +22,14 @@ def __init__(self): | |||
self.queries = [] | |||
|
|||
@abstractmethod | |||
def prepare_queries(self, cursor: object, schema: str): | |||
def prepare_queries(self, cursor: object, schema: str, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned this in passing, but I wanted to call attention to 'you can now include arbitrary args when extending a tablebuilder' as maybe relevant for metrics work.
cumulus_library/databases.py
Outdated
warnings.warn( | ||
"Loading an ndjson dir is not supported with --db-type=athena." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change for my convenience, since I'm toggling between athena/duckdb fairly frequently. I can be talked out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally liked how it worked before (big surprise) but I'm not passionate about it.
My reasoning stemmed from the fact that athena is the default. So a lazy user might accidentally be in athena mode (forgetting to pass --db-type
) and if we don't stop and let them know (and how to correct their mistake -- hence my (try duckdb)
, because maybe they've never used that flag before), their clear user intent will be just dropped on the floor and they might think things are working when they are really not. A warning helps, but a straight error ensures they will notice that we didn't do what they asked.
But 🤷 maybe that user flow is not so important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm changing this back for now for this PR, but i reserve the right to add a workaround of some kind in the future if it becomes a pain, perhaps some kind of dev only flag or something like that.
@mikix Known issues I will address that are not blocking review:
|
docs/statistics.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comorbidity If you could - can you look at this file, and the linked PSM markdown file, on the branch just to check how parsable the overall documentation of this would be for a user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to mostly ignore the test file changes - my brain can't take it right now, and I trust they are good duckdb nonsense.
Approving - this is good! Feel free to land while I'm away
if not stats_build: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is an odd method construction. I understand why you might end up here reasonably, I'm just going to prod you to see if there's a better way to get at the flow you want. If not, this is fine.
It especially triggered me because stats_build=False
is the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what i believe stats generation lifecycle to look like:
- stats, if they exist in a study, are always run if they haven't been run before
- after this, stats should :not: be run
- a researcher may say 'ok, i've changed [x,y,z] and now i'd like to run a new sampling experiment, let me explicitly invoke it'
So some sort of 'usually false' workflow belong someplace. It might be worth nattering a bit about the the right layer of seperation between the Builder and the Parser (and should the Builder be named something different)? but it could move to the builder.
335be15
to
d6345d2
Compare
This PR makes the following changes:
data_path
as a class var to StudyBuilder/StudyParser to get a write dir to PSMChecklist
docs/
) needs to be updated