Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[plsql] Fix for #2589 -- performance improvements #3653

Merged
merged 6 commits into from
Aug 30, 2023
Merged

[plsql] Fix for #2589 -- performance improvements #3653

merged 6 commits into from
Aug 30, 2023

Conversation

kaby76
Copy link
Contributor

@kaby76 kaby76 commented Aug 8, 2023

This is a fix for #2589.

In this PR, I made a few changes to remove some ambiguity in the grammar. The changes improve the performance modestly, so there is more work to be done.

  • atom rule. This change involved removing the table_element outer_join_sign and quoted_string alts, which is covered with general_element outer_join_sign? and constant alts, respectively. This change improved some of the examples in long-running/ to be ~3 times faster for the TypeScript/JavaScript ports.
  • quoted_string rule. I removed the alt for variable_name from quoted_string and added an alt for DELIMITED_ID--which in itself is very weirdly named, as a double-quoted string should must be called DOUBLE_QUOTED_STRING. It makes no sense that qouted_string would derive a variable name.
  • general_element_part. I removed the ('.' id_expression)* syntax because the syntax is used with general_element_part throughout the grammar, creating ambiguity.

In addition, I changed the readme.md file to be standardized, containing links to all the relevant Oracle docs, with the latest version. It's not strictly "performance-related" but I found myself constantly referring back to the Oracle docs for this PR. Note, the docs are just horrible for finding anything but necessary reference material. There are no good searching tools for html or pdf, e.g. "column_alias" or "column alias". The HTML doc would be okay with a text editor, but the damn doc is spread over many HTML files. The webpages for the Oracle docs have UI for searching, but it knows nothing about searching for strings that contain spaces.

Performance

Using Ryzen 7 2700, 16GB, SSD, Windows 11, NET SDK 7. Grouped parsing of the 11 files hw-examples/*.sql, repeated 10 times. Calculations performed using ChatGPT4, Mean ± Margin of Error at 95% confidence level.

Target Before Change After Change
CSharp 3.41 ± 0.03 s 1.74 ± 0.02 s
Java 2.19 ± 0.06 s 1.82 ± 0.05 s
JavaScript 42.7 ± 1.1 s 19.2 ± 0.1 s
TypeScript 43.1 ± 0.6 s 19.4 ± 0.2 s

raw-data.txt

@teverett teverett added the plsql label Aug 9, 2023
@kaby76
Copy link
Contributor Author

kaby76 commented Aug 16, 2023

$ trperf long-running/aggregate01.sql | tail -n +2 | sort -k4 -n -r | column -t | head -22
Time to parse: 00:00:01.6186283
4     sql_script                                    1    6.4E-05   1    1   0   0   0  1
2692  general_element                               158  2.237276  667  6   78  0   0  276
2144  select_list_elements                          41   1.371319  133  5   29  0   0  40
2338  unary_expression                              107  1.083607  523  26  5   5   0  105
2319  concatenation                                 107  0.946815  293  15  0   0   0  36
2606  column_alias                                  29   0.916014  79   5   8   0   0  23
2638  tableview_name                                22   0.88526   277  28  16  0   0  58
2145  select_list_elements                          42   0.500934  194  5   0   0   0  16
2150  table_ref_aux                                 21   0.42423   84   4   20  0   0  29
2147  table_ref                                     21   0.374873  121  9   17  0   0  28
2322  concatenation                                 108  0.328362  110  2   1   0   0  18
2308  relational_expression                         119  0.286036  156  3   17  0   0  32
2608  where_clause                                  2    0.24779   26   15  0   0   0  14
2369  standard_function                             5    0.241239  63   15  0   0   0  21
2695  general_element_part                          158  0.207995  158  1   0   0   0  11
2694  general_element_part                          158  0.207801  158  1   0   0   0  11
2557  other_function                                2    0.198099  52   26  0   0   0  20
2640  tableview_name                                22   0.196869  22   1   0   0   0  4
2298  logical_expression                            86   0.170201  87   1   1   0   0  13
2138  query_block                                   4    0.148154  9    5   1   0   0  6
2136  query_block                                   4    0.147636  10   3   2   0   0  6
2160  join_clause                                   34   0.131541  85   3   17  0   0  24
08/15-17:59:35 ~/issues/g4-2589/sql/plsql/Generated-CSharp-full

general_element accounts for the majority of the time for long-running/aggregate01.sql. Not sure why yet. Working on getting more information from DiagnosticErrorListener.

@kaby76
Copy link
Contributor Author

kaby76 commented Aug 16, 2023

What I really need is to parse some input from general_element and see the perf stats from that. I also need a more human-readable string corresponding to the derivations found for a config set. A bunch of state numbers does not work because it has to be related back to the grammar. Using an IDE (aka "interp mode") does not work because the grammar contains actions.

@kaby76
Copy link
Contributor Author

kaby76 commented Aug 29, 2023

What I really need is to parse some input from general_element and see the perf stats from that. I also need a more human-readable string corresponding to the derivations found for a config set. A bunch of state numbers does not work because it has to be related back to the grammar. Using an IDE (aka "interp mode") does not work because the grammar contains actions.

general_element is still a source of problems, but the problem isn't ambiguity, but that the parser falls back to full context parsing in many situations with this symbol.

@kaby76 kaby76 marked this pull request as ready for review August 29, 2023 10:28
@teverett
Copy link
Member

@kaby76 thanks!

@teverett teverett merged commit 81e5ebe into antlr:master Aug 30, 2023
38 checks passed
//| CHAR_STRING_PERL
| NATIONAL_CHAR_STRING_LIT
| DELIMITED_ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question. The string uses single quotes. Why can it match double quotes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, SELECT "C1" FROM "T1", "C1" will be matched into a quoted_string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELIMITED_ID was added to quoted_string in order to get examples/create_table.sql to pass. But, the problem is lob_item.

Please open an Issue for tracking. I will adjust the grammar asap.

The fix will move the DELIMITED_ID alt to the lob_item rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaby76 ok , I've already opened an issue #3700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants