Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for partially written numbers and numbers in text2num #27

Open
mxdev88 opened this issue Apr 27, 2020 · 4 comments
Open

Add support for partially written numbers and numbers in text2num #27

mxdev88 opened this issue Apr 27, 2020 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@mxdev88
Copy link

mxdev88 commented Apr 27, 2020

Currently, partially written numbers throw ValueError. It would be an interesting addition to handle such cases.

>> text2num('10 millions', 'fr')
ValueError: invalid literal for text2num: '10 millions'

Expected result:
10000000

Similarly, numbers represented as text also throw ValueError instead of being converted to int.

>>> text2num('10', 'fr')
ValueError: invalid literal for text2num: '10'

Expected result:
10

@rtxm rtxm added the enhancement New feature or request label May 4, 2020
@rtxm rtxm added question Further information is requested and removed enhancement New feature or request labels May 12, 2020
@rtxm
Copy link
Collaborator

rtxm commented May 12, 2020

That's on purpose. Converting digit strings to integers is already supported by the int function of python, so text2num specifically converts spelled numbers to integers.

As for the mix of styles, what use cases are you thinking about?

@luismavs
Copy link

luismavs commented Nov 6, 2020

Hi,
In (at least Portuguese language) news articles, it is relatively common to find mixed style such as:

"Sismo na China provocou 20 mil mortos e 26 mil feridos"
https://www.rtp.pt/noticias/mundo/sismo-na-china-provocou-20-mil-mortos-e-26-mil-feridos_v185370

"Violentas explosões abalam Beirute. 100 mortos, 4 mil feridos e "muitos desaparecidos""
https://www.dn.pt/mundo/violenta-explosao-em-beirute-12495480.html

Text2num renders these as "20 1000", "26 1000", "4 000"

Would it be possible to improve on this?

@rtxm rtxm added the enhancement New feature or request label Jan 25, 2021
@NicolasMICAUX
Copy link

Hi, In (at least Portuguese language) news articles, it is relatively common to find mixed style such as:

"Sismo na China provocou 20 mil mortos e 26 mil feridos" https://www.rtp.pt/noticias/mundo/sismo-na-china-provocou-20-mil-mortos-e-26-mil-feridos_v185370

"Violentas explosões abalam Beirute. 100 mortos, 4 mil feridos e "muitos desaparecidos"" https://www.dn.pt/mundo/violenta-explosao-em-beirute-12495480.html

Text2num renders these as "20 1000", "26 1000", "4 000"

Would it be possible to improve on this?

Hi luismavs,
Have you found a simple solution for the mixed detection ("20 mil mortos" for example) ?

@luismavs
Copy link

luismavs commented Sep 6, 2022

Hi,
I did not check it again.
IMO, the simplest solution would be add a post-processing step to re-cast anomalous texts such as 4 1000 DDD0... perhaps with regex. But not that elegant for sure...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants