Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant selector inconsistencies #177

Open
chop-suey opened this issue Nov 5, 2024 · 2 comments
Open

Variant selector inconsistencies #177

chop-suey opened this issue Nov 5, 2024 · 2 comments

Comments

@chop-suey
Copy link

There seem to be some inconsistencies in the generated metadat (e.g. packages/data/en/data.raw.json).

In some cases the hexcode is missing the variant selector 16 fe0f according to the unicode data .

Examples:

  • Entry for "person in suit levitating"
    • Should be 1F574-FE0F according to unicode
    • hexcode is 1F574
    • emoji contains the sequence 1F574-FE0F
  • Entry for "umbrella with rain drops"
    • Should be 2614 according to unicode
    • hexcode is 2614
    • emoji contains the sequence 2614-FE0F
  • There are many more examples:

Like this i never now which property could be the source of truth. Am i missing something or is this an error in the data?

@milesj
Copy link
Owner

milesj commented Nov 5, 2024

It's been a while since I've worked on this, but the emoji and text fields are the source of truth ones, while hexcode is either unqualified, qualified, or the default variant I think. It's the value parsed from the left column of these data files: https://github.com/milesj/emojibase/blob/master/packages/generator/src/parsers/parseData.ts#L38

@chop-suey
Copy link
Author

chop-suey commented Nov 6, 2024

But still the emoji and text does not always contain the correct sequence, see my example for "umbrella with rain drops".
I just realized, there are also other representations of emoji in https://github.com/milesj/emojibase/blob/master/packages/data/meta/hexcodes.json. Is the hexcode in data.raw.json supposed to be used as key to get the matching mapping in hexcodes.json?

The entry for "umbrella with raind drops" in hexcodes.json looks like this:

"2614": {
  "2614": 0,
  "2614-FE0F": 0,
  "2614-FE0E": 0
}

According to this, all the entries are fully qualified, but in https://www.unicode.org/Public/emoji/15.1/emoji-test.txt it looks like only 2614 should be treated as fully qualified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants