LIMA is mutliplatform. It has been developed under GNU/Linux and ported MS Windows. Its build procedure under Linux is described bellow. Build instructions under Windows are still to be written but can be inferred from the Appveyor CI configuration file.
Build dependencies:
- Tools: cmake, ninja, C++ (tested with gcc and clang), gawk, NLTK,
- Libraries and development packages for : boost , Qt5 and Qwt.
Optional dependencies:
- python3:
- enchant: for orthographic correction;
- qhttpserver: lima http/json API;
- svmtool++: for SVM-based PoS tagger;
- TensorFlow, Eigen and Protobuf: for neural network-based modules (currently Named Entity Recognition and soon parsing too);
- tre: for approximate string matcher module;
Under Ubuntu, most of these dependencies are installed with the following packages:
$ sudo apt-get install python-nltk gawk cmake ninja-build qt5-default libqt5xmlpatterns5 \
libqt5xmlpatterns5-dev qttools5-dev build-essential libboost-all-dev libenchant-dev \
mesa-common-dev libgl1-mesa-dev libglu1-mesa-dev libasan0 qml-module-qt-labs-folderlistmodel \
libqwt-qt5-dev qtscript5-dev qtxmlpatterns5-dev-tools \
qml-module-qt-labs-settings qtdeclarative5-dev python3-dev libenchant-dev libtre-dev
qhttpserver can be downloaded and installed from https://github.com/aymara/qhttpserver/releases
svmtool++ can be downloaded and installed from https://github.com/aymara/svmtool-cpp/releases
To compile SVMTool models, you also need svm_light:
$ mkdir svm_light && cd svm_light
$ wget http://osmot.cs.cornell.edu/svm_light/current/svm_light.tar.gz
$ tar xvzf svm_light.tar.gz
$ make
$ sudo cp svm_classify svm_learn /usr/bin
For TensorFlow, we use a specially compiled version. It can be installed with our ppa in Ubuntu versions starting from 18.04:
$ sudo add-apt-repository ppa:limapublisher/ppa
$ sudo apt-get update
$ sudo apt install libtensorflow-for-lima-dev
Modified sources of TensorFlow are here.
As we were not able to find a Free part of speech tagged English corpus, LIMA depends for analyzing English on freely available but not Free data that you will have to download and prepare yourself. This data is an extract of the Peen treebank corpus available for fair use in the NLTK data. To install, please refer to http://nltk.org/data.html. Under Ubuntu this can be done like that:
$ sudo apt-get install python-nltk
$ python
>>> import nltk
>>> nltk.download()
d dependency_treebank
Then prepare the data for use with LIMA by running the following commands:
$ cd $HOME/nltk_data/corpora/dependency_treebank
$ cat wsj_*.dp | grep -v "^$" > nltk-ptb.dp
Move to the root of the LIMA git repository, e.g.:
$ cd $HOME/lima
You need to set up a few environment variables. For this purpose, you can source the setenv-lima.sh script from the root of the LIMA git repository (please check values before):
$ source ./setenv-lima.sh -m release
Finally, from the LIMA repository root, run:
$ ./gbuild.sh -m Release
By default LIMA is built without neural network-based modules (i.e. without TensorFlow). To build LIMA with neural network-based modules use -T option:
$ ./gbuild.sh -m Release -T
This builds LIMA in release mode, assuring the best performance. To report bugs
for example, you should build LIMA in debug mode. To do so, just omit the
-m Release
option when invoking setenv-lima.sh
and gbuild.sh
.
Alternatively, you can
- define the following environment variables manually:
LIMA_DIST
binaries and libraries
LIMA_EXTERNALS
dependencies
LIMA_RESOURCES
any kind of ressources (including training data)
LIMA_CONF
configuration folder
LINGUISTIC_DATA_ROOT
path to the lima_linguisticdata project root
NLTK_PTB_DP_FILE
path to the Penn treebank extract from NLTK (see below)
- set
PATH
andLD_LIBRARY_PATH
:
export PATH=$LIMA_DIST/bin:$LIMA_EXTERNALS/bin:$PATH
export LD_LIBRARY_PATH=$LIMA_EXTERNALS/lib:$LIMA_DIST/lib
- run
gbuild.sh
- If you use your own compiled boost libraries alongside system boost libraries AND cmake fails on lima_linguisticprocessings indicating it found your boost version headers but it uses the system libraries, add the following definition at the beginning of the root CMakeLists.txt of each subproject : set(Boost_NO_SYSTEM_PATHS ON)
- If some packages are not found at configure time (when running cmake), double check the dependencies packages you have installed. If it's OK, maybe we missed to indicate a dependency. Then, don't hesitate to open an issue. Or submit a merge request that solves the problem.