Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the tokenizer and keywords extractor tool #13

Open
jiangliqin opened this issue Dec 27, 2021 · 1 comment
Open

question about the tokenizer and keywords extractor tool #13

jiangliqin opened this issue Dec 27, 2021 · 1 comment

Comments

@jiangliqin
Copy link

Hi,I use the default jieba tokenizer tool and gensim/jieba keywords extractor tool to preprocess the corppus,but my result is not as good as you ,for example:
mine:['杨清', '孩子', '网友', '母亲', '小孩', '失望透顶', '父母', '发消息']
your:[ "王乐乐", "杨清柠", "奶粉", "外孙", "分手", "孩子"]

could you explain the tokenizer and keywords extractor tool that you use for more detail?

@yahiko-l
Copy link

stop words??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants