Add support for partially written numbers and numbers in text2num #27

mxdev88 · 2020-04-27T11:26:09Z

Currently, partially written numbers throw ValueError. It would be an interesting addition to handle such cases.

>> text2num('10 millions', 'fr')
ValueError: invalid literal for text2num: '10 millions'

Expected result:
10000000

Similarly, numbers represented as text also throw ValueError instead of being converted to int.

>>> text2num('10', 'fr')
ValueError: invalid literal for text2num: '10'

Expected result:
10

The text was updated successfully, but these errors were encountered:

rtxm · 2020-05-12T11:00:59Z

That's on purpose. Converting digit strings to integers is already supported by the int function of python, so text2num specifically converts spelled numbers to integers.

As for the mix of styles, what use cases are you thinking about?

luismavs · 2020-11-06T12:06:59Z

Hi,
In (at least Portuguese language) news articles, it is relatively common to find mixed style such as:

"Sismo na China provocou 20 mil mortos e 26 mil feridos"
https://www.rtp.pt/noticias/mundo/sismo-na-china-provocou-20-mil-mortos-e-26-mil-feridos_v185370

"Violentas explosões abalam Beirute. 100 mortos, 4 mil feridos e "muitos desaparecidos""
https://www.dn.pt/mundo/violenta-explosao-em-beirute-12495480.html

Text2num renders these as "20 1000", "26 1000", "4 000"

Would it be possible to improve on this?

NicolasMICAUX · 2022-07-22T13:10:09Z

Hi, In (at least Portuguese language) news articles, it is relatively common to find mixed style such as:

"Sismo na China provocou 20 mil mortos e 26 mil feridos" https://www.rtp.pt/noticias/mundo/sismo-na-china-provocou-20-mil-mortos-e-26-mil-feridos_v185370

"Violentas explosões abalam Beirute. 100 mortos, 4 mil feridos e "muitos desaparecidos"" https://www.dn.pt/mundo/violenta-explosao-em-beirute-12495480.html

Text2num renders these as "20 1000", "26 1000", "4 000"

Would it be possible to improve on this?

Hi luismavs,
Have you found a simple solution for the mixed detection ("20 mil mortos" for example) ?

luismavs · 2022-09-06T08:22:38Z

Hi,
I did not check it again.
IMO, the simplest solution would be add a post-processing step to re-cast anomalous texts such as 4 1000 DDD0... perhaps with regex. But not that elegant for sure...

rtxm added the enhancement New feature or request label May 4, 2020

rtxm added question Further information is requested and removed enhancement New feature or request labels May 12, 2020

rtxm added the enhancement New feature or request label Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for partially written numbers and numbers in text2num #27

Add support for partially written numbers and numbers in text2num #27

mxdev88 commented Apr 27, 2020

rtxm commented May 12, 2020

luismavs commented Nov 6, 2020

NicolasMICAUX commented Jul 22, 2022

luismavs commented Sep 6, 2022

Add support for partially written numbers and numbers in text2num #27

Add support for partially written numbers and numbers in text2num #27

Comments

mxdev88 commented Apr 27, 2020

rtxm commented May 12, 2020

luismavs commented Nov 6, 2020

NicolasMICAUX commented Jul 22, 2022

luismavs commented Sep 6, 2022