Skip to content
This repository has been archived by the owner on Apr 10, 2019. It is now read-only.
/ maccha Public archive

Calculating sentence similarity by word mover's distance

License

Notifications You must be signed in to change notification settings

chakki-works/maccha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

maccha

maccha is a project that calculate sentence similarity by word mover's distance.

So far, only in Japanese.

Install

To install required modules, simply:

$ pip install -r requirements.txt

maccha needs to install NEologd. Please install it.

Setup

First, you should download word vector and vocabulary's dictionary and store them into data directory.

For downloading files, please access qiita_vectors.zip.

If you finish downloading the file, please unzip it into maccha/data.

Execution

Please run the test to see if it works correctly:

$ python -m unittest tests.word_mover

If following messages are displayed, everything is fine!

Distance between "JavaScript" and "JavaScript 2014" is 2.087188959121704.
Distance between "DexIndexOverflowExceptionと戦った話" and "AWS×Imagick×facedetectで困った話" is 2.034774008499384.
Distance between "ゆるっとローカル環境を作る" and "ローカル環境を作る。" is 0.0.
Distance between "PHP5.6のインストール" and "PHP5.4をインストール" is 0.0.

License

MIT

Contact

[email protected]

Releases

No releases published

Packages

No packages published

Languages