Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use SLL mode #269

Merged
merged 6 commits into from
Mar 19, 2024
Merged

feat: use SLL mode #269

merged 6 commits into from
Mar 19, 2024

Conversation

HaydenOrz
Copy link
Collaborator

@HaydenOrz HaydenOrz commented Feb 29, 2024

简介

#247

指定 parser 解析模式为 PredictionMode.SLL,以获得更好的性能。

SLL 拥有更好的性能的同时,也对语法文件有更高的要求,它要求

  1. 语法文件的规则尽量简洁(减少前缀相同的备选分支)
  2. 语法文件内部规则没有很高的模糊性(二义性)
  3. 尽量少的使用左递归

主要变更

  1. basicParser 中为 parser 设置解析模式为 PredictionMode.SLL
  2. 优化 mysql、spark 以及 postgre 的语法文件,以适应 SLL 模式

性能变化

目前在 Flink 中有简单的基准测试,它并不能完整的体现出性能变化,但是作为参考已经足够

环境信息

npx envinfo --system --binaries

  System:
    OS: macOS 14.2.1
    CPU: (8) arm64 Apple M1
    Memory: 574.95 MB / 16.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 16.20.0 - ~/.nvm/versions/node/v16.20.0/bin/node
    Yarn: 1.22.13 - ~/.nvm/versions/node/v16.20.0/bin/yarn
    npm: 8.19.4 - ~/.nvm/versions/node/v16.20.0/bin/npm
    pnpm: 6.35.0 - ~/.nvm/versions/node/v16.20.0/bin/pnpm

antlr4ts 运行时

Name Rows Times Total Time(ms) Average Time(ms)
CreateTable 100 1 165.46 165.46
CreateTable 1000 1 202.15 202.15
CreateTable 5000 1 881.41 881.41
SelectTable 100 1 237.26 237.26
SelectTable 1000 1 578.70 578.70
SelectTable 5000 1 2764.02 2764.02
InsertTable 100 1 7.59 7.59
InsertTable 1000 1 29.82 29.82
InsertTable 5000 1 151.89 151.89

迁移到 antlr4ng 运行时,并使用默认模式解析

首先尝试使用 SLL 策略,如果解析失败,则自动转换为 LL 策略,以此来平衡速度和准确性

Name Rows Times Total Time(ms) Average Time(ms)
CreateTable 100 1 211.13 211.13
CreateTable 1000 1 1035.98 1035.98
CreateTable 5000 1 5973.12 5973.12
SelectTable 100 1 704.98 704.98
SelectTable 1000 1 4760.73 4760.73
SelectTable 5000 1 23670.30 23670.30
InsertTable 100 1 9.37 9.37
InsertTable 1000 1 27.92 27.92
InsertTable 5000 1 151.75 151.75

antlr4ng 指定解析模式为 LL 模式

Name Rows Times Total Time(ms) Average Time(ms)
CreateTable 100 1 215.38 215.38
CreateTable 1000 1 1042.00 1042.00
CreateTable 5000 1 5997.30 5997.30
SelectTable 100 1 711.21 711.21
SelectTable 1000 1 4809.67 4809.67
SelectTable 5000 1 23898.11 23898.11
InsertTable 100 1 9.98 9.98
InsertTable 1000 1 28.63 28.63
InsertTable 5000 1 156.41 156.41

antlr4ng 指定解析模式为 SLL 模式

Name Rows Times Total Time(ms) Average Time(ms)
CreateTable 100 1 104.93 104.93
CreateTable 1000 1 41.71 41.71
CreateTable 5000 1 119.47 119.47
SelectTable 100 1 225.62 225.62
SelectTable 1000 1 16.76 16.76
SelectTable 5000 1 56.55 56.55
InsertTable 100 1 9.73 9.73
InsertTable 1000 1 31.01 31.01
InsertTable 5000 1 162.13 162.13

语法文件优化纪要

SLL 模式与 LL 模式相比,拥有更快的解析速度,但是相应的 SLL模式在遇到某些情况时,可能无法正确的处理,这主要集中在

左递归

即一个规则直接或间接地引用自身作为其可能的第一个元素,此问题可能通过重写语法解决

具有相似前缀的选择

stmt: 
      SELECT column_list FROM table_name groupby_clause orderby_clasue
      SELECT column_list FROM table_name groupby_clause limit_clause

这种情况可以通过合并前缀相似的备选分支来解决,但是某些情况下,合并备选分支会降低准确性,这种情况很难处理。

高度模糊的语法

implicit_row:  '(' expr_list COMMA a_expr ')'

expr_list
    : a_expr (COMMA a_expr)*
    ;

此时 a_expr.a_expr.a_expr 可能全部被匹配为 expr_list 从而导致报错。此问题可以通过重写语法解决

src/grammar/pgsql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
src/grammar/pgsql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
@HaydenOrz HaydenOrz marked this pull request as draft February 29, 2024 09:13
@HaydenOrz HaydenOrz marked this pull request as ready for review March 1, 2024 08:57
@HaydenOrz HaydenOrz merged commit 3f62ad0 into DTStack:main Mar 19, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants