Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote local parquet CTAS queries #315

Merged
merged 1 commit into from
Nov 13, 2024
Merged

Conversation

dogversioning
Copy link
Contributor

This PR quote-escapes local parquet create table statements, in case those statements contain reserved sql words.

Checklist

  • Consider if documentation in docs/ needs to be updated
    • If you've changed the structure of a table, you may need to run generate-md
    • If you've added/removed core study fields that not in US Core, update our list of those in core-study-details.md
  • Consider if tests should be added
  • Update template repo if there are changes to study configuration in manifest.toml

Copy link

github-actions bot commented Nov 11, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
2272 2264 100% 90% 🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: 0f49187 by action🐍

Comment on lines +8 to +9
{%- if db_type == 'athena' %} {{ col }} {{ remote_table_cols_types[loop.index0] }}
{%- elif db_type == 'duckdb' %} "{{ col }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only quote for duckdb and not all the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Athena didn't like that, irritatingly - the serde style queries follow a slightly different set of syntax rules, it seems?

Comment on lines -562 to +560
"""CREATE TABLE IF NOT EXISTS local_table AS SELECT
a,
b
"""CREATE TABLE IF NOT EXISTS "local_table" AS SELECT "a", "b"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is the risk you are guarding against in this PR? Like, that a table name wouldn't have a prefix and also be a reserved word? Cause I assume no table with a study prefix could be a problem. Should we instead/additionally enforce that there is a prefix (look for a dunder)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, the LOINC groups dataset has a file just named 'Group', and columns named 'group'. I think you're right that, if a dunder was present, this would largely not be an issue. But since this is one of these static dataset situations that you want to use across multiple studies/datasources, I'm inclined to just live with the quoting here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah and the "dundering" there is done via a different namespace. Sure - quoting is fine, was just curious if we should go even further.

@dogversioning dogversioning force-pushed the mg/quote_duckdb_parquet branch from b238d99 to 0f49187 Compare November 13, 2024 15:30
@dogversioning dogversioning merged commit 2528c51 into main Nov 13, 2024
3 checks passed
@dogversioning dogversioning deleted the mg/quote_duckdb_parquet branch November 13, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants