Skip to content

Latest commit

 

History

History
137 lines (103 loc) · 5.23 KB

INSTALL.md

File metadata and controls

137 lines (103 loc) · 5.23 KB

LIMA - Libre Multilingual Analyzer

LIMA is mutliplatform. It has been developed under GNU/Linux and ported to MS Windows. Its build procedure under Linux is described bellow. Build instructions under Windows are still to be written but can be inferred from the Appveyor CI configuration file. This CI configuration fils contains also pointers to (hopefuly) up to date instructions to build under latest stable Debian and Ubuntu. Check them if instructions below fail.

LIMA has been occasionally built on MacOS but there is no standard procedure to do so.

Install

Build dependencies:

  • Tools: cmake, ninja, C++ (tested with gcc and clang), gawk, NLTK,
  • Libraries and development packages for : boost , Qt6 and Qwt.

Optional dependencies:

  • python3:
  • enchant: for orthographic correction;
  • qhttpserver: lima http/json API;
  • svmtool++: for SVM-based PoS tagger;
  • TensorFlow, Eigen and Protobuf: for neural network-based modules (currently Named Entity Recognition and soon parsing too);
  • tre: for approximate string matcher module;

Under Ubuntu, most of these dependencies are installed with the following packages:

sudo apt-get update && apt-get install -y locales unzip bash coreutils apt-utils lsb-release git gcc g++ build-essential make cmake cmake-data curl python3-nltk gawk wget python3 python3-pip ninja-build qt6-base-dev qt6-base-dev-tools libqt6concurrent6 qml6-module-qtqml qt6-tools-dev libqt6concurrent6 qt6-base-dev-tools qt6-declarative-dev qt6-declarative-dev-tools qt6-multimedia-dev libtre-dev libboost-all-dev nodejs npm libicu-dev libeigen3-dev dos2unix python-is-python3 nvidia-cuda-toolkit nvidia-cudnn python3-arpy python3-requests python3-tqdm

qhttpserver can be downloaded and installed from https://github.com/aymara/qhttpserver/releases

svmtool++ can be downloaded and installed from https://github.com/aymara/svmtool-cpp/releases

For TensorFlow, we use a specially compiled version. It can be installed with our ppa in Ubuntu versions starting from 18.04:

sudo add-apt-repository ppa:limapublisher/ppa
sudo apt-get update
sudo apt install libtensorflow-for-lima-dev

Modified sources of TensorFlow are here.

As we were not able to find a Free part of speech tagged English corpus, LIMA depends for analyzing English on freely available but not Free data that you will have to download and prepare yourself. This data is an extract of the Penn treebank corpus available for fair use in the NLTK data. To install, please refer to http://nltk.org/data.html. Under Ubuntu this can be done like that:

npm install -g json
sudo sed -ie "s|DEFAULT_URL = 'http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml'|DEFAULT_URL = 'http://nltk.github.com/nltk_data/'|" /usr/lib/python3/*/nltk/downloader.py
python3 -m nltk.downloader -d nltk_data dependency_treebank

Then prepare the data for use with LIMA by running the following commands:

cat nltk_data/corpora/dependency_treebank/wsj_*.dp | grep -v "^$" > nltk_data/corpora/dependency_treebank/nltk-ptb.dp

⚠️ If you havn't already downloaded LIMA git repository (source code), please do it now:

cd $HOME
git clone https://github.com/aymara/lima.git

Move to the root of the LIMA git repository and clone submodules, e.g.:

cd $HOME/lima
git submodule init
git submodule update

Downloal libtorch:

cd extern
./download_libtorch.sh
cd ..

You need to set up a few environment variables. For this purpose, you can source the setenv-lima.sh script from the root of the LIMA git repository (please check values before):

source ./setenv-lima.sh -m release

Finally, from the LIMA repository root, run:

./gbuild.sh -m Release -d ON

By default LIMA is built with neural network-based modules (i.e. with TensorFlow). To build LIMA without neural network-based modules use -T option:

./gbuild.sh -m Release -T

This builds LIMA in release mode, assuring the best performance. To report bugs for example, you should build LIMA in debug mode. To do so, just omit the -m Release option when invoking setenv-lima.sh and gbuild.sh. You can also use the -h option of gbuild.sh to see the other possibilities (deactivate packages generation, optimize for your computer, etc.)

After the installation of LIMA, if you have built the neural network-based modules (the default, see above), you must install the models for one of the 60+ supported languages.

Build troubleshoutings

  • If you use your own compiled boost libraries alongside system boost libraries AND cmake fails on lima_linguisticprocessings indicating it found your boost version headers but it uses the system libraries, add the following definition at the beginning of the root CMakeLists.txt of each subproject : set(Boost_NO_SYSTEM_PATHS ON)
  • If some packages are not found at configure time (when running cmake), double check the dependencies packages you have installed. If it's OK, maybe we missed to indicate a dependency. Then, don't hesitate to open an issue. Or submit a merge request that solves the problem.