From 93e96b89cbe522dd71008939a03fe3a8b59275bc Mon Sep 17 00:00:00 2001
From: Yingbo <yingbo.zhou@salesforce.com>
Date: Thu, 9 Sep 2021 20:16:30 -0700
Subject: [PATCH] add link to code

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 97df64b..e460db4 100644
--- a/README.md
+++ b/README.md
@@ -57,14 +57,14 @@ WER are we? An attempt at tracking states of the art(s) and recent results on sp
 
 | WER (SWB) | WER (CH) | Paper          | Published | Notes   |
 | :-------  | :------- | :------------- | :-------- | :-----: |
-| 4.9% | 9.5%  | [An investigation of phone-based subword units for end-to-end speech recognition](https://arxiv.org/abs/2004.04290) | April 2020 | 2 CNN + 24 layers Transformer encoder and 12 layers Transformer decoder model with char BPE and phoneme BPE units. |
+| 4.9% | 9.5%  | [An investigation of phone-based subword units for end-to-end speech recognition](https://arxiv.org/abs/2004.04290) | April 2020 | 2 CNN + 24 layers Transformer encoder and 12 layers Transformer decoder model with char BPE and phoneme BPE units. Code available [here](https://github.com/salesforce/TransformerASR) |
 | 5.0% | 9.1%  | [The CAPIO 2017 Conversational Speech Recognition System](https://arxiv.org/abs/1801.00059) | December 2017 | 2 Dense LSTMs + 3 CNN-bLSTMs across 3 phonesets from [previous Capio paper](https://pdfs.semanticscholar.org/d0ec/cd60d800308cd6e59810769b92b40961c09a.pdf) & AM adaptation using parameter averaging (5.6% SWB / 10.5% CH single systems) |
 | 5.1% | 9.9%  | [Language Modeling with Highway LSTM](https://arxiv.org/abs/1709.06436) | September 2017 | HW-LSTM LM trained with Switchboard+Fisher+Gigaword+Broadcast News+Conversations, AM from [previous IBM paper](https://arxiv.org/abs/1703.02136)|
 | 5.1% |       | [The Microsoft 2017 Conversational Speech Recognition System](https://arxiv.org/abs/1708.06073) | August 2017 | ~2016 system + character-based dialog session aware (turns of speech) LSTM LM |
 | 5.3% | 10.1% | [Deep Learning-based Telephony Speech Recognition in the Wild](https://pdfs.semanticscholar.org/d0ec/cd60d800308cd6e59810769b92b40961c09a.pdf) | August 2017 | Ensemble of 3 CNN-bLSTM (5.7% SWB / 11.3% CH single systems)
 | 5.5% | 10.3% | [English Conversational Telephone Speech Recognition by Humans and Machines](https://arxiv.org/abs/1703.02136) | March 2017 | ResNet + BiLSTMs acoustic model, with 40d FMLLR + i-Vector inputs, trained on SWB+Fisher+CH, n-gram + model-M + LSTM + Strided (à trous) convs-based LM trained on Switchboard+Fisher+Gigaword+Broadcast |
 | 6.3% | 11.9% | [The Microsoft 2016 Conversational Speech Recognition System](http://arxiv.org/pdf/1609.03528v1.pdf) | September 2016 | VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast |
-| 6.3% | 13.3%  | [An investigation of phone-based subword units for end-to-end speech recognition](https://arxiv.org/abs/2004.04290) | April 2020 | 2 CNN + 24 layers Transformer encoder and 12 layers Transformer decoder model with char BPE and phoneme BPE units. Trained only on SWBD 300 hours. |
+| 6.3% | 13.3%  | [An investigation of phone-based subword units for end-to-end speech recognition](https://arxiv.org/abs/2004.04290) | April 2020 | 2 CNN + 24 layers Transformer encoder and 12 layers Transformer decoder model with char BPE and phoneme BPE units. Trained only on SWBD 300 hours. Code available [here](https://github.com/salesforce/TransformerASR)|
 | 6.6% | 12.2% | [The IBM 2016 English Conversational Telephone Speech Recognition System](http://arxiv.org/pdf/1604.08242v2.pdf) | June 2016 | RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model |
 | 6.8% | 14.1% | [SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779) | April 2019 | Listen Attend Spell |
 | 8.5% | 13% | [Purely sequence-trained neural networks for ASR based on lattice-free MMI](http://www.danielpovey.com/files/2016_interspeech_mmi.pdf) | September 2016 | HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher |