maccha is a project that calculate sentence similarity by word mover's distance.
So far, only in Japanese.
To install required modules, simply:
$ pip install -r requirements.txt
maccha needs to install NEologd. Please install it.
First, you should download word vector and vocabulary's dictionary and store them into data directory.
For downloading files, please access qiita_vectors.zip.
If you finish downloading the file, please unzip it into maccha/data.
Please run the test to see if it works correctly:
$ python -m unittest tests.word_mover
If following messages are displayed, everything is fine!
Distance between "JavaScript" and "JavaScript 2014" is 2.087188959121704.
Distance between "DexIndexOverflowExceptionと戦った話" and "AWS×Imagick×facedetectで困った話" is 2.034774008499384.
Distance between "ゆるっとローカル環境を作る" and "ローカル環境を作る。" is 0.0.
Distance between "PHP5.6のインストール" and "PHP5.4をインストール" is 0.0.