You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MD_Utf8FromCodepoint sets the first byte incorrectly when the codepoint requires four bytes because it left-bitshifts MD_bitmask4 by 3 rather than 4. MD_bitmask4 is the value 0x0F (in binary 1111), and the first byte in UTF-8 of codepoints greater than 0xFFFF should start with the binary 11110 (which would then get bitshifted by 3 so the remaining 3 bits can hold codepoint info).
Bitshifting by 4 instead of 3 should fix the issue:
The text was updated successfully, but these errors were encountered:
DelleVelleD
changed the title
MD_DecodeCodepointFromUtf16 incorrectly calculates codepoints greater than 0xFFFF
Unicode decoding and encoding bugs for codepoints greater than 0xFFFF
Apr 17, 2022
MD_DecodeCodepointFromUtf16
incorrectly calculates codepoints greater than 0xFFFF because it does not offset by 0x10000.Adding 0x10000 to the end of the codepoint calculation should fix the issue:
Reference: Step 5 for Decoding UTF-16
MD_Utf8FromCodepoint
sets the first byte incorrectly when the codepoint requires four bytes because it left-bitshiftsMD_bitmask4
by 3 rather than 4.MD_bitmask4
is the value 0x0F (in binary 1111), and the first byte in UTF-8 of codepoints greater than 0xFFFF should start with the binary 11110 (which would then get bitshifted by 3 so the remaining 3 bits can hold codepoint info).Bitshifting by 4 instead of 3 should fix the issue:
The text was updated successfully, but these errors were encountered: