Add support for ES6 Unicode code point escapes #11

mathiasbynens · 2013-11-15T20:09:19Z

Add support for ES6 Unicode code point escape sequences.

Diff without whitespace changes: https://github.com/jviereck/regjsparser/pull/11/files?w=1 (Some indentation was incorrect)

jviereck · 2013-11-16T00:56:36Z

parser.js

      return createEscaped('unicode', res[1], 1);
+    } else if (res = matchReg(/^u\{([0-9a-fA-F]{1,6})\}/)) {


Wondering: Should there be a option-flag to enable ES6 features for the parser? I am happy if this is a global flag, that can/needs to be set on the parser object directly and not for every call to the parse function, as I assume most people will use the same mode (es5/es6) over all calls to parse.

Does that sounds reasonable to you?

Ideally it should just work. If you’re authoring in ES5, there won’t be any \u{xxxxxx}-style escape sequences anyway. If you’re authoring in ES6 there might be, but I don’t see why we’d need a flag to parse them. For opt-in Unicode regex features, we already have the u flag in the input anyway.

jviereck · 2013-11-16T01:14:31Z

Thanks a lot Mathias. I need to take a closer look at this tomorrow and read up on the ES6 unicode points/gramma changes but I have the feeling you know what you're doing ;)

termi · 2014-03-21T07:21:30Z

Hi folks! First of all, thx for the great work!
But I have to ask you: What the status of this branch?
Recently I needed to the es6 RegExp parser and the only one a have found is the one that present in this branch. But, unfortunately it's incomplete. So I had to do my own version of this branch. The reason for this:

Lack of Unicode surrogate support. change1 and change2. I am not sure that I am did it in a right way, but it works for me. Example:

parse('[\uD83D\uDCA9-\uD83D\uDCAB]', 'u').terms[0].classRanges[0].min
Object {type: "escape", name: "codePoint", value: "1F4A9", from: 1, to: 3, raw: "uD83DuDCA9"}

As you can see I am combined first unicode code point with second surrogate value. But only for RegExp with 'u' flag.
2. Incorrect from value as well as 'raw' value for 'characterClassRange' change3. I am sure this is a bug.
3. Due #3 still not closed I had to add my own option change4

jviereck · 2014-03-22T22:53:10Z

@termi, thanks for your comment! Would you mind to make a new PR with the changes 1-4 you have mentioned such that I can pull them in? Would really be happy to have them in the core library :)

termi · 2014-03-25T10:12:34Z

@jviereck I'll do it in one of these days

termi · 2014-03-27T17:06:34Z

@jviereck #13

termi · 2014-03-30T15:44:18Z

parser.js

      return createEscaped('unicode', res[1], 1);
+    } else if (res = matchReg(/^u\{([0-9a-fA-F]{1,6})\}/)) {
+      // RegExpUnicodeEscapeSequence (ES6 Unicode code point escape)
+      return createEscaped('codePoint', res[1], 3);


I guess it should be

return createEscaped('codePoint', res[1], 4);

to calculate the correct values of 'from' and 'raw' like in #12

Unicode surrogate pair support | ClassRange.from fix

jviereck reviewed Nov 16, 2013
View reviewed changes

This was referenced Nov 16, 2013

Fix 'raw' property for escaped characters to include baskslash. #12

Merged

The parse() API #3

Open

termi reviewed Mar 30, 2014
View reviewed changes

Merge pull request #13 from termi/es6-unicode-surrogates

0ff535f

Unicode surrogate pair support | ClassRange.from fix

jviereck merged commit 0ff535f into master Apr 6, 2014

mathiasbynens mentioned this pull request May 27, 2014

Add support for Unicode code point escape sequences \u{1D306} #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for ES6 Unicode code point escapes #11

Add support for ES6 Unicode code point escapes #11

mathiasbynens commented Nov 15, 2013

jviereck Nov 16, 2013

mathiasbynens Nov 21, 2013

jviereck Nov 21, 2013

jviereck commented Nov 16, 2013

termi commented Mar 21, 2014

jviereck commented Mar 22, 2014

termi commented Mar 25, 2014

termi commented Mar 27, 2014

termi Mar 30, 2014

		return createEscaped('unicode', res[1], 1);
		} else if (res = matchReg(/^u\{([0-9a-fA-F]{1,6})\}/)) {

Add support for ES6 Unicode code point escapes #11

Add support for ES6 Unicode code point escapes #11

Conversation

mathiasbynens commented Nov 15, 2013

jviereck Nov 16, 2013

Choose a reason for hiding this comment

mathiasbynens Nov 21, 2013

Choose a reason for hiding this comment

jviereck Nov 21, 2013

Choose a reason for hiding this comment

jviereck commented Nov 16, 2013

termi commented Mar 21, 2014

jviereck commented Mar 22, 2014

termi commented Mar 25, 2014

termi commented Mar 27, 2014

termi Mar 30, 2014

Choose a reason for hiding this comment