Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Warning: Duplicate word in word2vec file #887

Open
bact opened this issue Dec 11, 2023 · 0 comments
Open

bug: Warning: Duplicate word in word2vec file #887

bact opened this issue Dec 11, 2023 · 0 comments
Labels
bug bugs in the library
Milestone

Comments

@bact
Copy link
Member

bact commented Dec 11, 2023

Description

There are hundreds of warnings like this during unit test:

2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first

Expected results

No warning.

Current results

(partial)

2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first
2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word '	' in word2vec file, ignoring all but first
...
2023-12-11:03:40:57 WARNING  [gensim.models.keyedvectors:1909] duplicate word '' in word2vec file, ignoring all but first
2023-12-11:03:40:58 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'หยับ' in word2vec file, ignoring all but first

Steps to reproduce

Run unit test

PyThaiNLP version

dev

Python version

3.8

Operating system and version

n/a

More info

No response

Possible solution

No response

Files

No response

@bact bact added the bug bugs in the library label Dec 11, 2023
@bact bact added this to the 4.0 milestone Dec 11, 2023
@bact bact changed the title bug: Duplicate word in word2vec file bug: Warning: Duplicate word in word2vec file Dec 11, 2023
@github-project-automation github-project-automation bot moved this to To do in PyThaiNLP Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs in the library
Projects
Status: To do
Development

No branches or pull requests

1 participant