Textová analýza neoznačuje pojmy, které končí tečkou #238

ovitovec · 2022-02-24T16:35:35Z

Asi tím, který působí největší problémy je to, že pokud je slovo ukončeno tečkou - například ve věte: "..... was found missing." - i pokud je "missing." přidáno jako vyhledávací text, tak ho textová analýza neodhalí. Platí pro veškeré pojmy, za kterými je bez mezery tečka. - je to docela častý výskyt, vzhledem ke slovosledu v AJ.
Ten samý problém má analýza i se slovy s pomlčkou.
https://kbss.felk.cvut.cz/termit-csat/#/
Slovník komponent a závad

je to vážný problém - neodhaluje to pojmy

ovitovec · 2022-02-25T08:00:00Z

TERMIT také neoznačí pojem, který není oddělen z obou stran mezerou - tedy, pokud je překlep, pomlčka, tečka, slovo bez mezery apod., tak to z pravidla pojem neoznačí. Např.: "worklight assy-the parts was found missing"
"worklight assy" je ve vyhledávacích pojmech, ale nebylo označeno.

ovitovec · 2022-02-25T08:00:30Z

Pojí se s předchozím bodem, kdy jde slova s pomlčkou uvnitř - např. cover-end, retainer-clip, spring-assy - i pokud jsou tato slova> přiřazena jako vyhledávací text, tak ho analýza neodhalí

ahmadjana · 2022-06-09T21:30:40Z

@blcham
analyzing the problem (technical perspective): it is a Morphodita issue.
it takes the chunk, and tokenizes it. the token could be word+dot. so, it considers this as one token.

blcham · 2022-06-10T08:23:21Z

@ahmadjana So what do you suggest ?

ahmadjana · 2022-06-13T21:58:02Z

@blcham
Because it is a morphodita issue, it is possible to report the issue to the morphodia people.
Or: maybe, we can take the tokens and check if it ends with a dot or not.
if yes , remove the dots from the token.

blcham · 2022-06-17T16:46:26Z

It seems to me, that it is quite standard problem in NLP so i would think there is a solution for that in Morphodita (maybe some configuration?)

Or: maybe, we can take the tokens and check if it ends with a dot or not. if yes , remove the dots from the token.

I guess it depends in which part of NLP process this can be applied. Maybe we can apply this while trying to find out if the token is in our vocabulary ?!

ovitovec added the bug Something isn't working label Feb 24, 2022

MichalMed added the text analysis Ticket se projevuje v termitu, ale týká se textové analýzy label Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textová analýza neoznačuje pojmy, které končí tečkou #238

Textová analýza neoznačuje pojmy, které končí tečkou #238

ovitovec commented Feb 24, 2022

ovitovec commented Feb 25, 2022 •

edited

Loading

ovitovec commented Feb 25, 2022

ahmadjana commented Jun 9, 2022

blcham commented Jun 10, 2022 •

edited

Loading

ahmadjana commented Jun 13, 2022 •

edited

Loading

blcham commented Jun 17, 2022 •

edited

Loading

Textová analýza neoznačuje pojmy, které končí tečkou #238

Textová analýza neoznačuje pojmy, které končí tečkou #238

Comments

ovitovec commented Feb 24, 2022

ovitovec commented Feb 25, 2022 • edited Loading

ovitovec commented Feb 25, 2022

ahmadjana commented Jun 9, 2022

blcham commented Jun 10, 2022 • edited Loading

ahmadjana commented Jun 13, 2022 • edited Loading

blcham commented Jun 17, 2022 • edited Loading

ovitovec commented Feb 25, 2022 •

edited

Loading

blcham commented Jun 10, 2022 •

edited

Loading

ahmadjana commented Jun 13, 2022 •

edited

Loading

blcham commented Jun 17, 2022 •

edited

Loading