Skip to content

Commit

Permalink
retiring CMU Sphinx support
Browse files Browse the repository at this point in the history
  • Loading branch information
gooofy committed May 2, 2019
1 parent d8f7e2e commit c55da2b
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 508 deletions.
5 changes: 0 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,6 @@ kaldi:
./speech_kaldi_export.py
pushd data/dst/speech/de/kaldi && ./run.sh && popd

sphinx:
rm -rf data/dst/speech/de/cmusphinx
./speech_sphinx_export.py
pushd data/dst/speech/de/cmusphinx && ./sphinx-run.sh && popd

sequitur:
rm -rf data/dst/speech/de/sequitur/
./speech_sequitur_export.py
Expand Down
145 changes: 7 additions & 138 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
Python scripts to compute audio and language models from voxforge.org speech data and many sources.
Models that can be built include:

* CMU Sphinx continous and PTM audio models
* Kaldi nnet3 chain audio models
* KenLM language models in ARPA format
* sequitur g2p models
* wav2letter++ models

*Important*: Please note that these scripts form in no way a complete application ready for end-user consumption.
However, if you are a developer interested in natural language processing you may find some of them useful.
Expand Down Expand Up @@ -56,8 +56,6 @@ Table of Contents
* [English NNet3 Chain Models](#english-nnet3-chain-models)
* [German NNet3 Chain Models](#german-nnet3-chain-models)
* [Model Adaptation](#model-adaptation)
* [CMU Sphinx Models](#cmu-sphinx-models)
* [Running pocketsphinx](#running-pocketsphinx)
* [wav2letter\+\+ models](#wav2letter-models)
* [Audiobook Segmentation and Transcription (Manual)](#audiobook-segmentation-and-transcription-manual)
* [(0/3) Convert Audio to WAVE Format](#03-convert-audio-to-wave-format)
Expand Down Expand Up @@ -105,25 +103,17 @@ Our pre-built ASR models can be downloaded here: [ASR Models](http://goofy.zamia
+ `kaldi-generic-en-tri2b_chain`
GMM Model, trained on the same data as the above two models - meant for auto segmentation tasks.
+ Kaldi ASR, German:
+ `kaldi-generic-de-tdnn_sp`
Large nnet3-chain model, trained on ~260 hours of audio. Has decent background noise resistance and can
+ `kaldi-generic-de-tdnn_f`
Large nnet3-chain model, trained on ~400 hours of audio. Has decent background noise resistance and can
also be used on phone recordings.
+ `kaldi-generic-de-tdnn_250`
Same as the large model but less resource intensive, suitable for use in embedded applications (e.g. a RaspberryPi 3).
+ `kaldi-generic-de-tri2b_chain`
GMM Model, trained on the same data as the above two models - meant for auto segmentation tasks.
+ CMU Sphinx, English:
+ `cmusphinx-cont-generic-en`
Large model, trained on ~800 hours of audio. Has decent background noise resistance and can
+ wav2letter++, German:
+ `w2l-generic-de`
Large model, trained on ~400 hours of audio. Has decent background noise resistance and can
also be used on phone recordings.
+ `cmusphinx-ptm-generic-en`
Same as the large model but less resource intensive, suitable for use in embedded applications.
+ CMU Sphinx, German:
+ `cmusphinx-ptm-generic-de`
Large model, trained on ~260 hours of audio. Has decent background noise resistance and can
also be used on phone recordings.
+ `cmusphinx-cont-generic-de`
Same as the large model but less resource intensive, suitable for use in embedded applications.

*NOTE*: It is important to realize that these models can and should be adapted to your application domain. See
[Model Adaptation](#model-adaptation) for details.
Expand Down Expand Up @@ -378,37 +368,12 @@ Requirements
*Note*: probably incomplete.
* Python 2.7 with nltk, numpy, ...
* CMU Sphinx
* KenLM
* kaldi
* wav2letter
* wav2letter++
* py-nltools
* sox
To set up a Conda environment named `gooofy-speech` with all Python
dependencies installed, run
$ conda env create -f environment.yml
To activate the environment, run
$ source activate gooofy-speech
To deactivate the environment, run
$ source deactivate
*Note*: The Conda environment was created on a Linux machine, so maybe it won't
work on other machines.
While the environment is activated, you may want to install additional packages
with `conda install` or `pip install`. After doing so, update `environment.yml`
with
$ ./update_conda_env.sh
Afterwards you can commit the changes to the repository.
Setup Notes
===========
Expand Down Expand Up @@ -893,102 +858,6 @@ cd ../../../../..
```
CMU Sphinx Models
=================
The following recipe trains a continuous CMU Sphinx model for German.
Before running it, make sure all prerequisites are met (see above for instructions on these):
- language model `generic_de_lang_model_small` built
- some or all speech corpora of `voxforge_de`, `gspv2`, `forschergeist` and `zamia_de` are installed, converted and scanned.
- optionally noise augmented corpora: `voxforge_de_noisy`, `voxforge_de_phone`, `zamia_de_noisy` and `zamia_de_phone`
```bash
./speech_sphinx_export.py generic-de2 dict-de.ipa generic_de_lang_model_small voxforge_de gspv2 [ forschergeist zamia_de ...]
cd data/dst/asr-models/cmusphinx_cont/generic-de
./sphinx-run.sh
```
complete export run (without noise augmented corpora):
```bash
./speech_sphinx_export.py generic-de dict-de.ipa generic_de_lang_model_small voxforge_de gspv2 forschergeist zamia_de m_ailabs_de
```
complete export run with noise augmented corpora included for an English model:
```bash
./speech_sphinx_export.py -l en generic-en dict-en.ipa generic_en_lang_model_small voxforge_en librispeech zamia_en cv_corpus_v1 ljspeech m_ailabs_en tedlium3
```
For resource constrained applications, PTM models can be trained:
```bash
./speech_sphinx_export.py generic-de dict-de.ipa generic_de_lang_model_small voxforge_de gspv2 [ forschergeist zamia_de ...]
cd data/dst/asr-models/cmusphinx_ptm/generic-de
./sphinx-run.sh
```
Running pocketsphinx
--------------------
*IMPORTANT*: In order to use our pre-built models you have to use up-to-date CMU Sphinx. Unfortunately, at the time
of this writing even the latest "5prealpha" release is outdated. Until the CMU Sphinx project has a new release,
we highly recommend to check out and build it yourself from their github repository.
Here are some sample invocations for pocketsphinx which should help get you started using our models:
```bash
pocketsphinx_continuous -lw 10 -fwdflatlw 10 -bestpathlw 10 -beam 1e-80 \
-wbeam 1e-40 -fwdflatbeam 1e-80 -fwdflatwbeam 1e-40 \
-pbeam 1e-80 -lpbeam 1e-80 -lponlybeam 1e-80 \
-wip 0.2 -agc none -varnorm no -cmn current \
-lowerf 130 -upperf 6800 -nfilt 25 \
-transform dct -lifter 22 -ncep 13 \
-hmm ${MODELDIR}/model_parameters/voxforge.cd_cont_8000 \
-dict ${MODELDIR}/etc/voxforge.dic \
-lm ${MODELDIR}/etc/voxforge.lm.bin \
-infile $WAVFILE
sphinx_fe -c fileids -di wav -do mfcc \
-part 1 -npart 1 -ei wav -eo mfc -nist no -raw no -mswav yes \
-samprate 16000 -lowerf 130 -upperf 6800 -nfilt 25 -transform dct -lifter 22
pocketsphinx_batch -hmm ${MODELDIR}/model_parameters/voxforge.cd_cont_8000 \
-feat 1s_c_d_dd \
-ceplen 13 \
-ncep 13 \
-lw 10 \
-fwdflatlw 10 \
-bestpathlw 10 \
-beam 1e-80 \
-wbeam 1e-40 \
-fwdflatbeam 1e-80 \
-fwdflatwbeam 1e-40 \
-pbeam 1e-80 \
-lpbeam 1e-80 \
-lponlybeam 1e-80 \
-dict ${MODELDIR}/etc/voxforge.dic \
-wip 0.2 \
-ctl fileids \
-cepdir ./mfcc \
-cepext .mfc \
-hyp test_batch.match \
-logfn test_batch.log \
-agc none -varnorm no -cmn current -lm ${MODELDIR}/etc/voxforge.lm.bin
```
You can download a complete tarball with example scripts and WAV files here:
http://goofy.zamia.org/voxforge/misc/sphinx-example.tgz
*NOTE*: According to https://github.com/cmusphinx/pocketsphinx/issues/116
pocketsphinx\_continuous will have worse results compared to pocketsphinx\_batch using the same model and parameters.
wav2letter++ models
===================
Expand Down
66 changes: 4 additions & 62 deletions speech_dist.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

if [ $# -lt 2 ] ; then
echo "usage: $0 [-c] <model> [kaldi <experiment>|sphinx_cont|sphinx_ptm|sequitur|lm|voice <epoch>|w2l <experiment>]"
echo "usage: $0 [-c] <model> [kaldi <experiment>|sequitur|lm|voice <epoch>|w2l <experiment>]"
exit 1
fi

Expand Down Expand Up @@ -30,7 +30,7 @@ WHAT=$2
if [ $WHAT = "kaldi" ] ; then

if [ $# != 3 ] ; then
echo "usage: $0 [-c] <model> [kaldi <experiment>|sphinx_cont|sphinx_ptm|sequitur|lm|voice <epoch>|w2l <experiment>]"
echo "usage: $0 [-c] <model> [kaldi <experiment>|sequitur|lm|voice <epoch>|w2l <experiment>]"
exit 2
fi

Expand Down Expand Up @@ -112,64 +112,6 @@ if [ $WHAT = "kaldi" ] ; then

fi

if [ $WHAT = "sphinx_cont" ] ; then

#
# cont sphinx model
#

DISTDIR=data/dist/asr-models

AMNAME="cmusphinx-cont-${MODEL}-${REVISION}"
echo "$AMNAME ..."

mkdir -p "$DISTDIR/$AMNAME"
mkdir -p "$DISTDIR/$AMNAME/model_parameters"

cp -r data/dst/asr-models/cmusphinx_cont/${MODEL}/model_parameters/voxforge.cd_cont_* "$DISTDIR/$AMNAME/model_parameters"
cp -r data/dst/asr-models/cmusphinx_cont/${MODEL}/etc "$DISTDIR/$AMNAME"
cp data/dst/asr-models/cmusphinx_cont/${MODEL}/voxforge.html "$DISTDIR/$AMNAME"
cp README.md "$DISTDIR/$AMNAME"
cp LICENSE "$DISTDIR/$AMNAME"
cp AUTHORS "$DISTDIR/$AMNAME"

pushd $DISTDIR
tar cfv "$AMNAME.tar" $AMNAME
xz -v -8 -T 12 "$AMNAME.tar"
popd

rm -r "$DISTDIR/$AMNAME"
fi

if [ $WHAT = "sphinx_ptm" ] ; then

#
# ptm sphinx model
#

DISTDIR=data/dist/asr-models

AMNAME="cmusphinx-ptm-${MODEL}-${REVISION}"
echo "$AMNAME ..."

mkdir -p "$DISTDIR/$AMNAME"
mkdir -p "$DISTDIR/$AMNAME/model_parameters"

cp -r data/dst/asr-models/cmusphinx_ptm/${MODEL}/model_parameters/voxforge.cd_ptm_5000 "$DISTDIR/$AMNAME/model_parameters"
cp -r data/dst/asr-models/cmusphinx_ptm/${MODEL}/etc "$DISTDIR/$AMNAME"
cp data/dst/asr-models/cmusphinx_ptm/${MODEL}/voxforge.html "$DISTDIR/$AMNAME"
cp README.md "$DISTDIR/$AMNAME"
cp LICENSE "$DISTDIR/$AMNAME"
cp AUTHORS "$DISTDIR/$AMNAME"

pushd $DISTDIR
tar cfv "$AMNAME.tar" $AMNAME
xz -v -8 -T 12 "$AMNAME.tar"
popd

rm -r "$DISTDIR/$AMNAME"
fi

if [ $WHAT = "lm" ] ; then
#
# KenLM
Expand Down Expand Up @@ -199,7 +141,7 @@ fi
if [ $WHAT = "voice" ] ; then

if [ $# != 3 ] ; then
echo "usage: $0 [-c] <model> [kaldi <experiment>|sphinx_cont|sphinx_ptm|sequitur|lm|voice <epoch>]"
echo "usage: $0 [-c] <model> [kaldi <experiment>|sequitur|lm|voice <epoch>]"
exit 2
fi

Expand Down Expand Up @@ -232,7 +174,7 @@ fi
if [ $WHAT = "w2l" ] ; then

if [ $# != 3 ] ; then
echo "usage: $0 [-c] <model> [kaldi <experiment>|sphinx_cont|sphinx_ptm|sequitur|lm|voice <epoch>|w2l <experiment>]"
echo "usage: $0 [-c] <model> [kaldi <experiment>|sequitur|lm|voice <epoch>|w2l <experiment>]"
exit 2
fi

Expand Down
Loading

0 comments on commit c55da2b

Please sign in to comment.