Skip to content

Latest commit

 

History

History
130 lines (110 loc) · 5.65 KB

installation.md

File metadata and controls

130 lines (110 loc) · 5.65 KB

Building wav2letter++

Build Requirements

  • A C++ compiler with good C++ 11 support (e.g. g++ >= 4.8)
  • cmake — version 3.5.1 or later, make

Dependencies

  • flashlight is required. flashlight must be built with distributed training enabled.
  • libsndfile is required for loading audio. If using wav2letter++ with flac files, libsndfile must be built with Ogg, Vorbis and FLAC libraries.
  • Intel's Math Kernel Library is required for featurization.
  • FFTW is required for featurization.
  • KenLM is required for the decoder. One of LZMA, BZip2, or Z is required for LM compression with KenLM.
  • gflags is required.
  • glog is required.

The following dependencies are automatically downloaded/built on build:

  • gtest and gmock 1.8.1 is built if building tests.
  • If using the CUDA criterion backend (see below), NVIDIA cub 1.8.0 is downloaded and linked to criterion CUDA kernels.

Optional Dependencies

  • flashlight requires CUDA >= 9.2; if building wav2letter++ with the CUDA criterion backend, CUDA >= 9.2 is required. Using CUDA 9.2 is recommended.
  • If building with the CPU criterion backend, wav2letter++ will try to compile with OpenMP, for better performance.

Build Options

Options Configuration Default Value
W2L_CRITERION_BACKEND CUDA, CPU CUDA
W2L_BUILD_TESTS ON, OFF ON
CMAKE_BUILD_TYPE CMake build types Debug

General Build Instructions

First, clone the repository:

git clone --recursive https://github.com/facebookresearch/wav2letter.git

and follow the build instructions for your specific OS.

There is no install procedure currently supported for wav2letter++. Building produces three binaries in the build directory:

  • Train: given a dataset of input audio and corresponding transcriptions in sub-word units (graphemes, phonemes, etc), trains the acoustic model.
  • Test: performs inference on a given dataset with an acoustic model.
  • Decode: given an acoustic model/pre-computed network emissions and a language model, computes the most likely sequence of words for a given dataset.

Building on Linux

wav2letter++ has been tested on Ubuntu 16.04 and CentOS 7.5.

Assuming you have ArrayFire, flashlight, libsndfile, and KenLM built/installed, install the below dependencies with apt (or your distribution's package manager):

sudo apt-get update
sudo apt-get install \
    # Audio encoding libs for libsndfile \
    libasound2-dev \
    libflac-dev \
    libogg-dev \
    libtool \
    libvorbis-dev \
    # FFTW for Fourier transforms \
    libfftw3-dev \
    # Compression libraries for KenLM \
    zlib1g-dev \
    libbz2-dev \
    liblzma-dev \
    libboost-all-dev \
    # gflags \
    libgflags-dev \
    libgflags2v5 \
    # glog \
    libgoogle-glog-dev \
    libgoogle-glog0v5 \

MKL and KenLM aren't easily discovered by CMake by default; export environment variables to make sure they're found. On most Linux-based systems, MKL is installed in /opt/intel/mkl. Since KenLM doesn't support an install step, after building KenLM, point CMake to wherever you downloaded and built KenLM:

export MKLROOT=/opt/intel/mkl # or path to MKL
export KENLM_ROOT_DIR=[path to KenLM]

Once you've downloaded wav2letter++ and built and installed the required dependencies:

# in your wav2letter++ directory
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_CRITERION_BACKEND=[backend] # Replace backend with CUDA or CPU
make -j4 # (or any number of threads)

Building/Running with Docker

wav2letter++ and its dependencies can also be built with the provided Dockerfile. Both CUDA and CPU backends are supported with Docker

To build wav2letter++ with Docker:

  • Install Docker and, if using the CUDA backend, nvidia-docker

  • Run the docker image with CUDA/CPU backend in a new container:

    # with CUDA backend
    sudo docker run --runtime=nvidia --rm -itd --ipc=host --name w2l wav2letter/wav2letter:cuda-latest
    # or with CPU backend
    sudo docker run --rm -itd --ipc=host --name w2l wav2letter/wav2letter:cpu-latest
    sudo docker exec -it w2l bash
    
  • To run tests inside a container

    cd /root/wav2letter/build && make test
    
  • Build Docker image from the source (using --no-cache will provide the latest version of flashlight inside the image if you have built the image previously for earlier versions of wav2letter):

    git clone --recursive https://github.com/facebookresearch/wav2letter.git
    cd wav2letter
    # for CUDA backend
    sudo docker build --no-cache -f ./Dockerfile-CUDA -t wav2letter .
    # for CPU backend
    sudo docker build --no-cache -f ./Dockerfile-CPU -t wav2letter .
    

    For logging during training/testing/decoding inside a container, use the --logtostderr=1 flag.