Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode in pattern: "Range out of order in character class" Error #1620

Closed
Zyclotrop-j opened this issue Sep 2, 2021 · 2 comments
Closed

Comments

@Zyclotrop-j
Copy link

Zyclotrop-j commented Sep 2, 2021

Summary: Unicode regex (eg /\u{10000}-\u{10FFFF}/u) do not work.

More specifically, regex depending on the unicode flag u u don't work.

Example:

import { createToken, Lexer } from "chevrotain";
const foo = createToken({ name: "foo", pattern: /([\u0001-\u0031\u{10000}-\u{10FFFF}]+)/u });
new Lexer([foo]) 
// throws
// SyntaxError: Invalid regular expression: /([\u0001-\u0031\u{10000}-\u{10FFFF}]+)/: Range out of order in character class
// at new RegExp (<anonymous>)
//    at addStickyFlag (chevrotain\lib\src\scan\lexer.js:635:12)
//    at chevrotain\lib\src\scan\lexer.js:100:27
// .....

Root-cause and possible solution:

In packages/chevrotain/src/scan/lexer.ts, methods like addStickyFlag or addStartOfInput strip the unicode-flag u of the source regex. The flag should be kept using pattern.flags.

(Very basic) example for addStickyFlag:

export function addStickyFlag(pattern: RegExp): RegExp {
  let flags = pattern.ignoreCase ? "iy" : "y"
  if(pattern.flags.includes("u")) flags += 'u'; // not beautiful but does the job
  return new RegExp(`${pattern.source}`, flags)
}

EDIT: Created PR in 1621

@Zyclotrop-j Zyclotrop-j changed the title Unicode in pattern Unicode in pattern: "Range out of order in character class" Error Sep 2, 2021
@bd82
Copy link
Member

bd82 commented Oct 9, 2021

Hello @Zyclotrop-j

Generally the unicode flag is not currently support by the library Chevrotain uses to parse the regular expressions

While this regexp parsing flow is not mandatory and is only needed for optimizations...
I am not sure if your suggested PR should be merged or if this issue should wait for a more through fix (e.g as part of #777 )

@bd82
Copy link
Member

bd82 commented Oct 29, 2021

I am closing this as it should be resolved as part of a much larger fix in #1670

@bd82 bd82 closed this as completed Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants