Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teradata SQL Grammar #4330

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

DmitryMikhailovich
Copy link
Contributor

Grammar of Teradata SQL dialect based on version 17.10 of Teradata Vantage.

Based on my estimates, this grammar is 90% complete. We used it to parse codebase of tens of thousands of objects for enterprise data lineage.

I've started to add examples only at a later stage of the development, so examples dir lacks many essential statements, like CRUD.

Q: Why "Teradata SQL" is in naming, not just "Teradata"?
A: I've made it deliberately, because this grammar covers only SQL, without BTEQ scripting language (I think BTEQ scripts should have separate grammar).

@@ -0,0 +1,31 @@
# Teradata SQL Grammar

An [ANTLR4](https://www.antlr.org/) grammar for Teradata SQL. Based on a grammar of Teradata Database version 17.10.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include a permalink to the actual grammar for version 17.10. Is it a bison grammar? Is there a lex file?

numeric_data_type
: BYTEINT
| SMALLINT
| (INTEGER|INT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses are used in places that don't need them. Why? E.g., (INTEGER | INT) instead of INTEGER | INT.

cursor_declaration*
condition_handler*
procedure_stat*
END (label_name)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on (label_name)?. Why this instead of label_name??

(ON logging_item (',' logging_item)* )?
;

operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort and uniq this list. You have duplicates, which is causing ambiguity to be detected in examples/ddl/logging/begin_logging.sql. For example, two INSERT, two SELECT, two UPDATE.

@kaby76
Copy link
Contributor

kaby76 commented Nov 16, 2024

(On the build, the MacOS runners in GitHub are terrible--hanging in the testing for Java. I'm not sure when they will finish, but it's stalled in parsing for the Java target. But, that works in about 9s on my machine just fine. My PR #4327 removes the parsing tests for MacOS in order to minimize hangs like this from happening.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants