Custom grammar with 'StringLiteral' much faster when using RegExps. #20

jsm174 · 2019-12-27T23:16:41Z

In #17, we had a question about case insensitive keywords.

We noticed significant performance enhancements when using RegExps for all string literals.

So for example:

LogicalOperatorExpression ::= RelationalOperatorExpression (WS ([A][n][d] | [O][r] | [X][o][r] | [E][q][v]) WS? RelationalOperatorExpression)*

is much faster than

LogicalOperatorExpression ::= RelationalOperatorExpression (WS ('And' | 'Or' | 'Xor' | 'Eqv') WS? RelationalOperatorExpression)*

As a test, we modified Grammars/Custom.ts:

       case 'NCName':
       case 'StringLiteral':
          bnfSeq.push(preDecoration + x.text + decoration);
          break;

to

        case 'NCName':
          bnfSeq.push(preDecoration + x.text + decoration);
          break;
        case 'StringLiteral':
          if (decoration || preDecoration) {
             bnfSeq.push(preDecoration + x.text + decoration);
          } else {
             for (const c of x.text.slice(1, -1)) {
                bnfSeq.push(new RegExp(c.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')));
             }
          }
          break;

The above would allow you to use 'And' in the grammar for readability, but generate a bnfseq like /A/, /n/, /d/ .

In a ~3200 line vbscript, applying this to an entire grammar, shaves off nearly 400-600ms.

We also tried, /And/ which proved worse then /A/, /n/, /d/. (And back to case insensitive, /And/i has worse performance than [A][a][N][n][D][d])

I was going to submit a PR with the above change, however, running the test cases results in two errors.

  1) New lang
       Grammars.Custom parses JSON grammar
         'var a = x match { else } map 1' must resolve into (FIRST RULE):
     Error: expect(received).toEqual(expected) // deep equality

Expected: "else } map 1"
Received: "{ else } map 1"
      at /Users/jmillard/pppp/node-ebnf/test/NewLang.spec.js:288:46
      at Context.<anonymous> (test/TestHelpers.js:28:17)
      at processImmediate (internal/timers.js:439:21)

  2) New lang
       Grammars.Custom parses JSON grammar
         'var a = x match { else -> } map 1' must resolve into (FIRST RULE):
     Error: expect(received).toEqual(expected) // deep equality

Expected: "} map 1"
Received: "{ else -> } map 1"
      at /Users/jmillard/pppp/node-ebnf/test/NewLang.spec.js:292:46
      at Context.<anonymous> (test/TestHelpers.js:28:17)
      at processImmediate (internal/timers.js:439:21)

Our grammar doesn't use any advanced features, so it was working great.

I must be missing something about how StringLiterals work?

FWIW, here are some benchmarks, using regexps instead of string literals:

  The scripting grammar - transpile
    ✓ should transpile controller.vbs successfully (186ms)
    ✓ should transpile core.vbs successfully (1736ms)
    ✓ should transpile core.vbs successfully (1729ms)
    ✓ should transpile core.vbs successfully (1685ms)
    ✓ should transpile core.vbs successfully (1662ms)
    ✓ should transpile core.vbs successfully (1674ms)
    ✓ should transpile core.vbs successfully (1701ms)

vs:

    ✓ should transpile controller.vbs successfully (218ms)
    ✓ should transpile core.vbs successfully (2349ms)
    ✓ should transpile core.vbs successfully (2287ms)
    ✓ should transpile core.vbs successfully (2352ms)
    ✓ should transpile core.vbs successfully (2244ms)
    ✓ should transpile core.vbs successfully (2277ms)
    ✓ should transpile core.vbs successfully (2313ms)

The text was updated successfully, but these errors were encountered:

menduz · 2019-12-27T23:33:05Z

It may be related to how regular expressions are parsed, creating a regular expression with the string [x] is not the same as matching a text [x], the regex searches for an x instead of [x], it may be related to the matcher with the curly braces of the test.

You could simply add an if to your patch to optimize StringLiterals without special characters using the regex path.

This should do the trick. If it works feel free to send a PR.

if (decoration || preDecoration || !/^[a-zA-Z0-9_-\s]+$/.test(x.text)) {

jsm174 · 2019-12-28T06:08:34Z

Thank you.

I wanted a few more characters to be supported, so I was able to get this working: (borrowed escapeRegExp from the parser)

   case 'StringLiteral':
      if (decoration || preDecoration || !/^['/\-\^()a-zA-Z0-9\\"&_.:=,]+$/.test(x.text)) {
         bnfSeq.push(preDecoration + x.text + decoration);
      } else {
         for (const c of x.text.slice(1, -1)) {
            bnfSeq.push(new RegExp(escapeRegExp(c)));
         }
      }
      break;

I wanted < and > as well, but whenever I include those characters, thats when the test cases from above start failing.

menduz · 2019-12-28T15:46:48Z

[email protected]

menduz · 2019-12-28T15:48:57Z

Are you bringing back the good old On Error Resume Next?

jsm174 · 2019-12-28T17:15:25Z

Ha. Yeh we are going to try: vpdb/vpx-js#141

jsm174 mentioned this issue Dec 27, 2019

Performance tips for grammars with case insensitive keywords? #17

Open

jsm174 mentioned this issue Dec 28, 2019

Updated StringLiteral in custom grammar to use RegExps for faster performance #21

Merged

menduz closed this as completed Dec 28, 2019

jsm174 mentioned this issue Dec 31, 2019

Add support for ignoreCase attribute for StringLiterals #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom grammar with 'StringLiteral' much faster when using RegExps. #20

Custom grammar with 'StringLiteral' much faster when using RegExps. #20

jsm174 commented Dec 27, 2019

menduz commented Dec 27, 2019 •

edited

Loading

jsm174 commented Dec 28, 2019

menduz commented Dec 28, 2019

menduz commented Dec 28, 2019

jsm174 commented Dec 28, 2019

Custom grammar with 'StringLiteral' much faster when using RegExps. #20

Custom grammar with 'StringLiteral' much faster when using RegExps. #20

Comments

jsm174 commented Dec 27, 2019

menduz commented Dec 27, 2019 • edited Loading

jsm174 commented Dec 28, 2019

menduz commented Dec 28, 2019

menduz commented Dec 28, 2019

jsm174 commented Dec 28, 2019

menduz commented Dec 27, 2019 •

edited

Loading