Release Release V2 · SeanNaren/deepspeech.pytorch

Supplied are a set of pre-trained networks that can be used for evaluation on academic datasets. Do not expect these models to perform well on your own data! They are heavily tuned to the datasets they are trained on.

Most results are given using 'greedy decoding', with the addition of WER/CER for LibriSpeech using a LM. Expect a well trained language model to reduce WER/CER substantially.

Improvements:

Remove TorchAudio and use Scipy when loading audio based on speed comparisons and ease of installation
Improved implementation of Nvidia Apex to make mixed precision training easier to use
New pre-trained models using mixed-precision
Documentation and improvements on how to tune and use librispeech LMs, and results based with the 3-gram model
Evaluation fixes for fairer comparison

Commit Hash used for training and testing.

AN4

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/an4_train_manifest.csv --val-manifest data/an4_val_manifest.csv --epochs 70 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 32 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id an4 --checkpoint --save-folder deepspeech.pytorch/an4/ --model-path deepspeech.pytorch/an4/deepspeech_final.pth

Test Command:

python test.py --model-path an4_pretrained_v2.pth --test-manifest data/an4_val_manifest.csv --cuda --half

Dataset	WER	CER
AN4 test	10.349	7.076

Download here.

Librispeech

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/libri_train_manifest.csv --val-manifest data/libri_val_manifest.csv --epochs 60 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id libri --checkpoint --save-folder deepspeech.pytorch/librispeech/ --model-path deepspeech.pytorch/librispeech/deepspeech_final.pth

Test Command:

python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_clean.csv --cuda --half
python test.py --model-path librispeech_pretrained_v2.pth --test-manifest data/libri_test_other.csv --cuda --half

Dataset	WER	CER
Librispeech clean	9.919	3.307
Librispeech other	28.116	12.040

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

Test Command:

python test.py --test-manifest libri_test_clean.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024
python test.py --test-manifest libri_test_other.csv --lm-path 3-gram.pruned.3e-7.arpa --decoder beam --alpha 1.97 --beta 4.36 --model-path librispeech_pretrained_v2.pth --lm-workers 8 --num-workers 16 --cuda --half --beam-width 1024

Dataset	WER	CER
Librispeech clean	6.654	2.705
Librispeech other	19.889	10.467

Download here.

TEDLIUM

Training command:

python train.py --rnn-type lstm --hidden-size 1024 --hidden-layers 5  --train-manifest data/ted_train_manifest.csv --val-manifest data/ted_val_manifest.csv --epochs 60 --num-workers 16 --cuda  --learning-anneal 1.01 --batch-size 64 --no-sortaGrad --visdom  --opt-level O1 --loss-scale 1 --id ted --checkpoint --save-folder deepspeech.pytorch/tedlium/ --model-path deepspeech.pytorch/tedlium/deepspeech_final.pth

Test Command:

python test.py --model-path ted_pretrained_v2.pth --test-manifest data/ted_test_manifest.csv --cuda --half

Dataset	WER	CER
Ted test	30.886	11.196

Download here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release V2

AN4

Librispeech

With 3-Gram ARPA LM with tuned alpha/beta values (alpha=1.97, beta=4.36, beam-width=1024)

TEDLIUM