Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom grammar with 'StringLiteral' much faster when using RegExps. #20

Closed
jsm174 opened this issue Dec 27, 2019 · 5 comments
Closed

Custom grammar with 'StringLiteral' much faster when using RegExps. #20

jsm174 opened this issue Dec 27, 2019 · 5 comments

Comments

@jsm174
Copy link
Contributor

jsm174 commented Dec 27, 2019

In #17, we had a question about case insensitive keywords.

We noticed significant performance enhancements when using RegExps for all string literals.

So for example:

LogicalOperatorExpression ::= RelationalOperatorExpression (WS ([A][n][d] | [O][r] | [X][o][r] | [E][q][v]) WS? RelationalOperatorExpression)*

is much faster than

LogicalOperatorExpression ::= RelationalOperatorExpression (WS ('And' | 'Or' | 'Xor' | 'Eqv') WS? RelationalOperatorExpression)*

As a test, we modified Grammars/Custom.ts:

       case 'NCName':
       case 'StringLiteral':
          bnfSeq.push(preDecoration + x.text + decoration);
          break;

to

        case 'NCName':
          bnfSeq.push(preDecoration + x.text + decoration);
          break;
        case 'StringLiteral':
          if (decoration || preDecoration) {
             bnfSeq.push(preDecoration + x.text + decoration);
          } else {
             for (const c of x.text.slice(1, -1)) {
                bnfSeq.push(new RegExp(c.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')));
             }
          }
          break;

The above would allow you to use 'And' in the grammar for readability, but generate a bnfseq like /A/, /n/, /d/ .

In a ~3200 line vbscript, applying this to an entire grammar, shaves off nearly 400-600ms.

We also tried, /And/ which proved worse then /A/, /n/, /d/. (And back to case insensitive, /And/i has worse performance than [A][a][N][n][D][d])

I was going to submit a PR with the above change, however, running the test cases results in two errors.

  1) New lang
       Grammars.Custom parses JSON grammar
         'var a = x match { else } map 1' must resolve into (FIRST RULE):
     Error: expect(received).toEqual(expected) // deep equality

Expected: "else } map 1"
Received: "{ else } map 1"
      at /Users/jmillard/pppp/node-ebnf/test/NewLang.spec.js:288:46
      at Context.<anonymous> (test/TestHelpers.js:28:17)
      at processImmediate (internal/timers.js:439:21)

  2) New lang
       Grammars.Custom parses JSON grammar
         'var a = x match { else -> } map 1' must resolve into (FIRST RULE):
     Error: expect(received).toEqual(expected) // deep equality

Expected: "} map 1"
Received: "{ else -> } map 1"
      at /Users/jmillard/pppp/node-ebnf/test/NewLang.spec.js:292:46
      at Context.<anonymous> (test/TestHelpers.js:28:17)
      at processImmediate (internal/timers.js:439:21)

Our grammar doesn't use any advanced features, so it was working great.

I must be missing something about how StringLiterals work?

FWIW, here are some benchmarks, using regexps instead of string literals:

  The scripting grammar - transpile
    ✓ should transpile controller.vbs successfully (186ms)
    ✓ should transpile core.vbs successfully (1736ms)
    ✓ should transpile core.vbs successfully (1729ms)
    ✓ should transpile core.vbs successfully (1685ms)
    ✓ should transpile core.vbs successfully (1662ms)
    ✓ should transpile core.vbs successfully (1674ms)
    ✓ should transpile core.vbs successfully (1701ms)

vs:

    ✓ should transpile controller.vbs successfully (218ms)
    ✓ should transpile core.vbs successfully (2349ms)
    ✓ should transpile core.vbs successfully (2287ms)
    ✓ should transpile core.vbs successfully (2352ms)
    ✓ should transpile core.vbs successfully (2244ms)
    ✓ should transpile core.vbs successfully (2277ms)
    ✓ should transpile core.vbs successfully (2313ms)
@menduz
Copy link
Member

menduz commented Dec 27, 2019

It may be related to how regular expressions are parsed, creating a regular expression with the string [x] is not the same as matching a text [x], the regex searches for an x instead of [x], it may be related to the matcher with the curly braces of the test.

You could simply add an if to your patch to optimize StringLiterals without special characters using the regex path.

This should do the trick. If it works feel free to send a PR.

if (decoration || preDecoration || !/^[a-zA-Z0-9_-\s]+$/.test(x.text)) {

@jsm174
Copy link
Contributor Author

jsm174 commented Dec 28, 2019

Thank you.

I wanted a few more characters to be supported, so I was able to get this working: (borrowed escapeRegExp from the parser)

   case 'StringLiteral':
      if (decoration || preDecoration || !/^['/\-\^()a-zA-Z0-9\\"&_.:=,]+$/.test(x.text)) {
         bnfSeq.push(preDecoration + x.text + decoration);
      } else {
         for (const c of x.text.slice(1, -1)) {
            bnfSeq.push(new RegExp(escapeRegExp(c)));
         }
      }
      break;

I wanted < and > as well, but whenever I include those characters, thats when the test cases from above start failing.

@menduz
Copy link
Member

menduz commented Dec 28, 2019

[email protected]

@menduz menduz closed this as completed Dec 28, 2019
@menduz
Copy link
Member

menduz commented Dec 28, 2019

Are you bringing back the good old On Error Resume Next?

@jsm174
Copy link
Contributor Author

jsm174 commented Dec 28, 2019

Ha. Yeh we are going to try: vpdb/vpx-js#141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants