jrae is a re-implemention of semi-supervised recursive autoencoder in java. This package also contains code to demonstrate its usage.
More details are available at http://www.socher.org/index.php/Main/Semi-SupervisedRecursiveAutoencodersForPredictingSentimentDistributions
Also read http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ for a neat explanation on recursive deep representations.
In short, semi-supervised recursive autoencoder is a feature learning algorithm to learn an encoding for text data and that can then be used for performing classification. The jrae package is pretty comprehensive - it includes code for learning the features as well as for performing basic classification, and is parallelized to run on a multi-core machine.
The package includes a demo of movie review classification on which the algorithm attains state-of-art results. Please use rc3 for your experiments https://github.com/sancha/jrae/releases/tag/rc3, and use the master branch only for contributions. The master branch includes some unsupported code.
-
The core feature of the recursive autoencoder is to learn a representation of words and sentences. Google recently released a similar tool, you are encouraged to try out the word2vec project http://code.google.com/p/word2vec/
-
Stanford has an official code package integrated into Stanford CoreNLP, please check http://nlp.stanford.edu/sentiment/code.html for updates.
The RAE package requires the jblas package for supporting the linear algebra operations. These requirements are included in the lib directory.
- jblas
- junit4
- log4j
- jmatio
Including the jblas jar file may not be sufficient. JBLAS requires either
LAPACK or ATLAS. Check out https://github.com/mikiobraun/jblas if you run
into trouble. If you are running ubuntu, do sudo apt-get install libgfortran3
.
If you encounter any bugs, please report it on github.
- Author: Sanjeev Satheesh [email protected]
- Created: 2012 February 20
- Keywords: java, sentiment analysis, machine learning, nlp
- URL: http://github.com/sancha/jrae