-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quote local parquet CTAS queries #315
Conversation
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified FilesNo covered modified files...
|
{%- if db_type == 'athena' %} {{ col }} {{ remote_table_cols_types[loop.index0] }} | ||
{%- elif db_type == 'duckdb' %} "{{ col }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only quote for duckdb and not all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Athena didn't like that, irritatingly - the serde style queries follow a slightly different set of syntax rules, it seems?
"""CREATE TABLE IF NOT EXISTS local_table AS SELECT | ||
a, | ||
b | ||
"""CREATE TABLE IF NOT EXISTS "local_table" AS SELECT "a", "b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is the risk you are guarding against in this PR? Like, that a table name wouldn't have a prefix and also be a reserved word? Cause I assume no table with a study prefix could be a problem. Should we instead/additionally enforce that there is a prefix (look for a dunder)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, the LOINC groups dataset has a file just named 'Group', and columns named 'group'. I think you're right that, if a dunder was present, this would largely not be an issue. But since this is one of these static dataset situations that you want to use across multiple studies/datasources, I'm inclined to just live with the quoting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah and the "dundering" there is done via a different namespace. Sure - quoting is fine, was just curious if we should go even further.
b238d99
to
0f49187
Compare
This PR quote-escapes local parquet create table statements, in case those statements contain reserved sql words.
Checklist
docs/
needs to be updatedgenerate-md
core
study fields that not in US Core, update our list of those incore-study-details.md
manifest.toml