You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: Unicode regex (eg /\u{10000}-\u{10FFFF}/u) do not work.
More specifically, regex depending on the unicode flag u u don't work.
Example:
import { createToken, Lexer } from "chevrotain";
const foo = createToken({ name: "foo", pattern: /([\u0001-\u0031\u{10000}-\u{10FFFF}]+)/u });
new Lexer([foo])
// throws
// SyntaxError: Invalid regular expression: /([\u0001-\u0031\u{10000}-\u{10FFFF}]+)/: Range out of order in character class
// at new RegExp (<anonymous>)
// at addStickyFlag (chevrotain\lib\src\scan\lexer.js:635:12)
// at chevrotain\lib\src\scan\lexer.js:100:27
// .....
Root-cause and possible solution:
In packages/chevrotain/src/scan/lexer.ts, methods like addStickyFlag or addStartOfInput strip the unicode-flag u of the source regex. The flag should be kept using pattern.flags.
(Very basic) example for addStickyFlag:
export function addStickyFlag(pattern: RegExp): RegExp {
let flags = pattern.ignoreCase ? "iy" : "y"
if(pattern.flags.includes("u")) flags += 'u'; // not beautiful but does the job
return new RegExp(`${pattern.source}`, flags)
}
While this regexp parsing flow is not mandatory and is only needed for optimizations...
I am not sure if your suggested PR should be merged or if this issue should wait for a more through fix (e.g as part of #777 )
Summary: Unicode regex (eg
/\u{10000}-\u{10FFFF}/u
) do not work.More specifically, regex depending on the unicode flag
u
u don't work.Example:
Root-cause and possible solution:
In
packages/chevrotain/src/scan/lexer.ts
, methods likeaddStickyFlag
oraddStartOfInput
strip the unicode-flagu
of the source regex. The flag should be kept usingpattern.flags
.(Very basic) example for addStickyFlag:
EDIT: Created PR in 1621
The text was updated successfully, but these errors were encountered: