bugfix: char class casefold for certain chars #20

haozhun · 2015-03-20T19:13:51Z

When a character is less than or equal to single byte size (0xff),
yet it takes more than 1 byte in the current encoding, the
case folding code incorrectly put it in bitset instead of code
range. As a result, for utf8 encoding, casefold works incorrectly
on characters in range \u0080 to \u00ff (latin1 supplement).

Before fix:

"\u00c2" [\u00e0-\u00e5] returns false
"\u00c2" [\u00e2] returns false
"\u00c2" \u00e2 returns true

When a character is less than or equal to single byte size (0xff), yet it takes more than 1 byte in the current encoding, the case folding code incorrectly put it in bitset instead of code range. As a result, for utf8 encoding, casefold works incorrectly on characters in range \u0080 to \u00ff (latin1 supplement). Before fix: * `"\u00c2"` `[\u00e0-\u00e5]` returns false * `"\u00c2"` `[\u00e2]` returns false * `"\u00c2"` `\u00e2` returns true

haozhun force-pushed the ic branch from 13fe106 to ad6a090 Compare April 9, 2015 03:26

haozhun force-pushed the ic branch from ad6a090 to 5c804e4 Compare April 20, 2015 20:21

haozhun force-pushed the ic branch from 5c804e4 to c703f2a Compare April 20, 2015 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: char class casefold for certain chars #20

bugfix: char class casefold for certain chars #20

haozhun commented Mar 20, 2015

bugfix: char class casefold for certain chars #20

Are you sure you want to change the base?

bugfix: char class casefold for certain chars #20

Conversation

haozhun commented Mar 20, 2015