Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lexer): allow unicode sequences in tokens #1621

Closed
wants to merge 1 commit into from
Closed

fix(lexer): allow unicode sequences in tokens #1621

wants to merge 1 commit into from

Conversation

Zyclotrop-j
Copy link

Allow tokens to use patterns like '/\u{10334}/u'
Change addStartOfInput and addStickyFlag to keep the 'u' flag

Fixes #1620

Allow tokens to use patterns like '/\u{10334}/u'
Change addStartOfInput and addStickyFlag to keep the 'u' flag

Fixes #1620
@bd82
Copy link
Member

bd82 commented Oct 29, 2021

As mentioned in the linked issue.
I am closing this as it should be resolved as part of a much larger feature in #1670

@bd82 bd82 closed this Oct 29, 2021
@bd82
Copy link
Member

bd82 commented Oct 29, 2021

Thanks for the effort @Zyclotrop-j 👍 unfortunately I am concerned that merging this partial fix will cause new bugs and strange behaviors so we will have to wait for a full resolution.

@Zyclotrop-j
Copy link
Author

Hi @bd82 ,
Thanks for the heads-up.
What is the timeline for the full resolution and/or how can I help to get that done and Unicode sequence support introduced?

@bd82
Copy link
Member

bd82 commented Oct 30, 2021

Hello @Zyclotrop-j

There are no timelines as this is my free time side project so it depends on the amount of free time / energy / random choice of which item to work on next.

If your workaround is good enough for you, you can try applying it via "patch-package" on your repo

That will likely be the fastest way you can integrate it.

In regards to contributions: while normally I would be very happy to accept contributions.
With this issue it could be more complicated, as I am not sure which is the best way to approach the problem
(See #1670).

At the moment I believe that my approach would be to:

  1. map missing capabilities / bugs in regexp-to-ast
  2. Implement most/all of the missing capabilities.
  3. move regexp-to-ast into this repo.
  4. Update Chevrotain source code to use the new capabilities (this is part of what you have implemented here).

So this seem quite a bit more complicated than a simple feature/fix contribution PR.
And the plan may change, e.g If I discover that step (2) is too complex, I may choose to make another attempt with
the "regexpp" library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode in pattern: "Range out of order in character class" Error
2 participants