Failed parsing #44

GordonSmith · 2020-03-09T10:32:03Z

(\b(?i:(integer|unsigned))(EX)?\b)

Specifically the case insensitive part ?i:

The text was updated successfully, but these errors were encountered:

bd82 · 2020-03-11T22:41:58Z

Thanks for reporting this @GordonSmith
Do you know which version of ECMAScript added this syntax?

This makes wonder again if I could transition Chevrotain to user another regExp parser (one which I do not have to maintain) as over time more and more regExp features are added and I've built this project mainly for use in Chevrotain.

Investigate using regexpp Instead of regexp-to-ast Chevrotain/chevrotain#777

GordonSmith · 2020-03-12T05:20:59Z

I don't know off hand, but do know I have been using it for several years now.

GordonSmith · 2020-03-12T05:23:20Z

Out of curiosity why does Chevrotain need to parse the RegEx? I would have thought using them "black box" would have been sufficient?

GordonSmith · 2020-03-12T08:08:10Z

I didn't mention why this is import! I want to common up my VS Code "syntaxes" regex with the ones I use in the Chevrotain lexer (with plans to automate the syncing).

In VS Code the declaration looks like this (from json file):

        {
            "name": "entity.name.type.ecl",
            "match": "\\b(?i:(integer|unsigned))[1-8]?\\b"
        },

While in Chevrotain:

const IntegerType = createToken({ name: "IntegerType", pattern: /(\b(integer|unsigned)[1-8]?\b)/i });

My current plan was to standardize the two to look like this:

\\?i:(b((integer|unsigned))[1-8]?\\b)
/(\b(integer|unsigned)[1-8]?\b)/i

At which point I would have some hope of auto syncing...

bd82 · 2020-03-12T20:17:19Z

Out of curiosity why does Chevrotain need to parse the RegEx? I would have thought using them "black box" would have been sufficient?

It is not mandatory, just for optimization purposes, by understand which characters can match each token pattern Chevrotain can save quite a-lot of time during the lexing phase.

See: https://sap.github.io/chevrotain/docs/guide/performance.html#ensuring-lexer-optimizations

So I am uncertain this issue should be a blocker for you

GordonSmith · 2020-03-13T09:49:58Z

Interesting - as a potential side project I could see a "chevrotain grammar -> VSCode Language Extension" utility being able to get a huge % of the grunt work automated.

My gut says there is a disconnect between the Grammar and the CST tree (the loss of some of the semantic logic from the parser definition) that if it was preserved in the CstNode as information would simplify the Visitor pattern somewhat. At the moment it feels like I have to write everything twice (but slightly differently), but if I knew that certain children where "OR" and what the sequential order of the children was, then I could simply walk the CST Tree with a simpler visit pattern.

(sorry for nattering off topic).

bd82 · 2020-03-13T15:42:35Z

as a potential side project I could see a "chevrotain grammar -> VSCode Language Extension" utility being able to get a huge % of the grunt work automated.

I think you be describing something like Xtext

https://www.eclipse.org/Xtext/

bd82 · 2020-03-13T15:52:32Z

I have created some editor logic utils specifically for the XML language.
The most complex case is the content assist logic which is responsible for understanding the content assist **syntactic context" and executing a relevant callback to provide the **semantic** suggestions.

https://github.com/SAP/xml-tools/tree/master/packages/content-assist

I find it hard to imagine how such logic would be generalized to a library
The grammar information by itself is not sufficient, perhaps some additional "annotations" could provide the required extra info.

EDIT: you may want to look here: Chevrotain/chevrotain#921

bd82 · 2020-03-13T16:03:19Z

Regarding the CST structure. it is intentionally very simple to allow fast construction and traversal.

You may be able to override methods from the tree builder trait to change the CST structure being built.

https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/src/parse/parser/traits/tree_builder.ts

Feel free to share your results.

GordonSmith · 2020-03-15T07:06:30Z

Re XText - yes, but 100% within JS (I actually had some experience with XText about 5 years ago, while writing a language extension for Eclipse - the same language I am partially implementing in Chevrotain now...).
FWIW I already have an LSP implementation, which hooks into our native compiler (c++ based) and while its semantic output is "ok" for a lot of things, it is only accurate at the time of syntax check. My primary goal is to see if I can get enough grammar together for auto formatting the language.

bd82 · 2020-03-15T22:46:22Z

Perhaps a prettier plugin is relevant for you, here are a couple of examples using Chevrotain:

Although as you mentioned you main issue is the parsing enough of a not well defined language to properly parse it.

The Java Parser (in prettier-java) above uses a-lot of back tracking as lookahead to stay close to the Java Spec which is very well defined just not LL(K)... perhaps such backtracking would be of use to you when handling your difficult grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed parsing #44

Failed parsing #44

GordonSmith commented Mar 9, 2020

bd82 commented Mar 11, 2020 •

edited

Loading

GordonSmith commented Mar 12, 2020

GordonSmith commented Mar 12, 2020

GordonSmith commented Mar 12, 2020

bd82 commented Mar 12, 2020

GordonSmith commented Mar 13, 2020

bd82 commented Mar 13, 2020

bd82 commented Mar 13, 2020 •

edited

Loading

bd82 commented Mar 13, 2020

GordonSmith commented Mar 15, 2020

bd82 commented Mar 15, 2020

Failed parsing #44

Failed parsing #44

Comments

GordonSmith commented Mar 9, 2020

bd82 commented Mar 11, 2020 • edited Loading

GordonSmith commented Mar 12, 2020

GordonSmith commented Mar 12, 2020

GordonSmith commented Mar 12, 2020

bd82 commented Mar 12, 2020

GordonSmith commented Mar 13, 2020

bd82 commented Mar 13, 2020

bd82 commented Mar 13, 2020 • edited Loading

bd82 commented Mar 13, 2020

GordonSmith commented Mar 15, 2020

bd82 commented Mar 15, 2020

bd82 commented Mar 11, 2020 •

edited

Loading

bd82 commented Mar 13, 2020 •

edited

Loading