Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysterious boundary for numeral recognization #1485

Open
SaltfishAmi opened this issue Dec 3, 2023 · 3 comments
Open

Mysterious boundary for numeral recognization #1485

SaltfishAmi opened this issue Dec 3, 2023 · 3 comments

Comments

@SaltfishAmi
Copy link
Contributor

SaltfishAmi commented Dec 3, 2023

image

I even don't know how to describe this issue as it's kind of hilarious...

I put my cursor at the first and apparently 10ten recognized the two kanji s and the first digit 2 in the date as a number 882. For some reason it did not stop at the emoticon img and give me 88 nor continue going on and give me 882023.

Webpage: https://www.bilibili.com/video/av236733736/ , but I think it may be difficult to navigate since it's in Chinese, so if you need any help, please let me know.

@birtles
Copy link
Member

birtles commented Dec 4, 2023

Are you on version 1.16? When I go there (on Nightly) I get:

image

@SaltfishAmi
Copy link
Contributor Author

SaltfishAmi commented Dec 4, 2023

@birtles OK, I confirmed same behavior as in your picture on 1.16.

Still, recognizing different numeral systems (and having an image inbetween) as one number seems like a mystery to me.

@birtles
Copy link
Member

birtles commented Dec 4, 2023

Yeah, I agree. I'm not sure if there's a reason we allow mixing numerals. We need to at least allow mixing Arabic numerals with kanji powers of ten so we can recognize 200万 etc. but I don't know if there's ever a case for mixing kanji 一~九 with Arabic numerals.

I'm also not sure if we need to be able to skip over images or whether they should always be treated as word boundaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants