This repo contains documentation and resources for MLA's implementation of Kaldi speech-to-text software.
-
Get MLA Amazon Web Services Console Login Credentials
-
Click on EC2 under AWS Services
-
Go to Launch Instance
-
Select Amazon Linux
-
Select t2.xlarge
-
Configure Instance Details
-
Shutdown Behavior: Stop
-
Enable Termination Protection
-
Click Next: Add Storage
-
For Root Volume Type
-
Size: 64
-
Volume Type: General Purpose
-
Click Next: Add Tags
-
Kaldi
-
Step 6: Configure Security Group
-
SSH/Protocol/Port Range 22/ Anywhere
-
Launch!
-
Create a new key pair: kaldi-aws-ec2-user
-
Save pem file
-
Launch Instance
-
chmod key file 600
-
ssh in to instance
-
sudo yum update
-
install git: sudo yum install git-all
-
git clone ...kaldi.git kaldi --origin upstream
-
cd in to kaldi/tools directory
-
extras/check_dependencies.sh
-
sudo yum install zlib-devel gcc automake autoconf patch libtool
-
sudo yum install atlas.x86_64
-
sudo yum install gcc-c++
-
run 'make -j 4'
-
snapshot image
-
cd kaldi/src
-
run ./configure --shared
-
run make depend -j 4
-
run make -j 4
-
PUA Installation (https://github.com/popuparchive/american-archive-kaldi)
-
cd $HOME
-
git clone https://github.com/popuparchive/american-archive-kaldi
-
Download the latest PUA Models, final result should be (re-)named exp
-
curl -O 'https://cytranet.dl.sourceforge.net/project/popuparchive-kaldi/exp2.tar.gz'
-
NOTE: THIS WAS PROBLEMATIC, REQUIRING MANY HOURS AND "CONTINUE" SESSIONS LIKE THIS:
-
Unpack the fully-formed exp2.tar.gz to become directory 'exp'
- tar -xzf exp2.tar.gz exp
-
Move the PUA Models into the american-archive-kaldi/sample_experiment
- mv exp american-archive-kaldi/sample_experiment/
-
Create symlinks between structures in kaldi and american-archive-kaldi
- ln -s kaldi/egs/wsj/s5/steps american-archive-kaldi/sample_experiment/
- ln -s kaldi/egs/wsj/s5/utils american-archive-kaldi/sample_experiment/
- ln -s kaldi/egs/wsj/s5/steps american-archive-kaldi/sample_experiment/exp/
- ln -s kaldi/egs/wsj/s5/utils american-archive-kaldi/sample_experiment/exp/
Install system dependencies for perl - sudo yum install perl-core.x86_64 - sudo yum install perl-JSON.noarch - sudo yum install perl-CPAN.noarch # optional unless the following statement fails - cpan JSON # initializes CPAN and does confirm/install JSON - perl -MCPAN -e shell - install File::Slurp::Tiny - install Data::Dump - exit
-
Install SOX in root directory
-
Run ./configure in sox-[version] directory
-
Run make -s
-
Run sudo make install
-
Run make installcheck
-
Install FFMPEG in root directory
-
Sudo yum install yasm
-
Run ./configure in ffmpeg directory
-
Run make
-
Run sudo make install
-
Ensure CMUSeg is installed
-
cd american-archive-kaldi/sample_experiment/
-
./install-cmuseg.sh
-
Install IRSTLM
-
cd kaldi/tools/extras
-
run install ./install_irstlm.sh
-
vi .bash_profile
-
Add command the following command to .bash_profile: . ./kaldi/tools/extras/env.sh
-
Install sclite
-
cd kaldi/tools/sctk-2.4.10
-
configure CFLAGS variable in makefiles to compile sclite for a 64-bit environment, adding -m64 to:
- sctk-2.4.10/src/asclite/core/makefile: CFLAGS = -g -Os -m64
- sctk-2.4.10/src/asclite/test/makefile: CFLAGS = -Os -Wall -Wconversion -m64
- sctk-2.4.10/src/sclite/makefile: CFLAGS = -Os -m64
- NOTE: the README cites src/rfilter1/makefile as well, but it contained no CFLAGS variable
-
Run make config
-
Run make all
-
Run make check
-
Run make install
-
Run make doc
-
Deal with python errors thrown when using american-archive-kaldi/run_kaldi.py
-
Upgrade pip and install ftfy
- NOTE: the following assumes use of python v2
- sudo pip install --upgrade pip
- sudo ln -s /usr/local/bin/pip /usr/bin/pip
- sudo pip install ftfy==4.4.1
- sudo pip install PY3
- sudo pip install -U six
-
More idiocy about inadequate permissions:
-
list inventory of now-executable files [ec2-user@ip-172-30-0-129 sample_experiment]$ find . -type f -name '*sh' -executable ./install-cmuseg.sh ./set-kaldi-path.sh ./run.sh
-
list non-executable files ending with '.sh' and make them executable [ec2-user@ip-172-30-0-129 sample_experiment]$ find . -type f -name '*sh' -not -executable ./path.sh ./tools/CMUseg_0.5/src/lib/sphere/src/scripts/install.sh ./tools/CMUseg_0.5/src/lib/sphere/src/scripts/gen_make.sh ./tools/CMUseg_0.5/src/lib/sphere/src/scripts/h_header.sh ./tools/CMUseg_0.5/src/lib/sphere/src/scripts/raw2nist.sh ./tools/CMUseg_0.5/src/lib/sphere/src/scripts/gen_setv.sh ./tools/CMUseg_0.5/src/lib/sphere/src/scripts/gen_lord.sh ./tools/CMUseg_0.5/src/lib/sphere/bin/h_header.sh ./tools/CMUseg_0.5/src/lib/sphere/bin/raw2nist.sh ./tools/CMUseg_0.5/src/lib/sphere/bin/gen_setv.sh ./tools/CMUseg_0.5/src/lib/sphere/bin/gen_lord.sh ./add_grammar.sh ./prep_lang_local.sh ./scripts/create_big_lm.sh ./scripts/qq.sh ./scripts/create_lm.sh [ec2-user@ip-172-30-0-129 sample_experiment]$ find . -type f -name '*sh' -not -executable -exec chmod +x {} ;
find /home/ec2-user/american-archive-kaldi/sample_experiment/tools/CMUseg_0.5/bin/linux -type f -exec chmod +x {} ; chmod +x american-archive-kaldi/sample_experiment/scripts/*
-
Begin idiocy involving perl -Use CPAN to install perl4 core lib for 'getopts.pl' reference in CMUseg.pl script -perl -MCPAN -e shell cpan > install Perl4::CoreLibs
-
Steps after following PUA install guide • In american-archive-kaldi/sample_experiment/run.sh, set the following option to reduce the number of simultaneous jobs to 1: nj=1 • In kaldi/egs/wsj/s5/utils/run.pl, set the following option: $max_jobs_run = 10; • In kaldi/egs/wsj/s5/steps/decode_fmllr.sh and kaldi/egs/wsj/s5/steps/tandem/decode_fmllr.sh, set the following options in the section where they appear together: nj=1 max_active=1000 • In decode.sh, set the following option: nj=2 in these locations:
-
kaldi/egs/wsj/s5/steps/nnet/decode.sh
-
kaldi/egs/wsj/s5/steps/online/decode.sh
-
kaldi/egs/wsj/s5/steps/online/nnet2/decode.sh
-
kaldi/egs/wsj/s5/steps/online/nnet3/decode.sh
-
kaldi/egs/wsj/s5/steps/decode.sh
-
kaldi/egs/wsj/s5/steps/tandem/decode.sh
-
kaldi/egs/wsj/s5/steps/nnet3/decode.sh
-
kaldi/egs/wsj/s5/steps/nnet2/decode.sh
-
kaldi/egs/aspire/s5/local/multi_condition/decode.sh