copyright | lastupdated | subcollection | ||
---|---|---|---|---|
|
2024-05-10 |
speech-to-text |
{{site.data.keyword.attribute-definition-list}}
{: #references}
For more information about the research behind the {{site.data.keyword.speechtotextfull}} service, see the following documents. {{site.data.keyword.IBM}} researchers wrote or contributed to all of these papers. {: shortdesc}
- Audhkhasi, Kartik, Bhuvana Ramabhadran, George Saon, Michael Picheny, and David Nahamoo. Direct Acoustics-to-Word Models for English Conversational Speech Recognition.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 959-963. {: #audhkhasi2017}
- Audhkhasi, Kartik, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, and Michael Picheny. Building competitive direct acoustics-to-word models for English conversational speech recognition.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018). {: #audhkhasi2018}
- Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer. A Maximum Likelihood Approach to Continuous Speech Recognition.{: external} IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5(2) (March 1983): pp. 179-190. {: #bahl1983}
- Fukuda, Takashi, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, and Bhuvana Ramabhadran. Efficient Knowledge Distillation from an Ensemble of Teachers.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 3697-3701. {: #fukuda2017}
- Graves, Alex. Sequence Transduction with Recurrent Neural Networks.{: external} International Conference of Machine Learning (ICML), Workshop on Representation Learning (November 2012). {: #graves2012}
- Hinton, Geoffrey, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.{: external} Signal Processing Magazine, IEEE, Vol. 29(6) (November 2012): pp. 82-97. {: #hinton2012}
- Jelinek, Frederick. The Development of an Experimental Discrete Dictation Recognizer.{: external} Proceedings of the IEEE, Vol. 73(11) (November 1985): pp. 1616-1624. {: #jelinek1985}
- Kurata, Gakuto, Abhinav Sethy, Bhuvana Ramabhadran, and George Saon. Empirical Exploration of Novel Architectures and Objectives for Language Models.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 279-283. {: #kurata2017a}
- Kurata, Gakuto, Bhuvana Ramabhadran, George Saon, and Abhinav Sethy. Language Modeling with Highway LSTM.{: external} Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2017). {: #kurata2017b}
- Kurata, Gakuto, and Kartik Audhkhasi. Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.{: external} Accepted by Interspeech 2019. {: #kurata2019}
- Padmanabhan, Mukund, and Michael Picheny. Large-Vocabulary Speech Recognition Algorithms.{: external} Computer, Vol. 35(4) (2002): pp. 42-50. {: #padmanabhan2002}
- Picheny, Michael, David Nahamoo, Vaibhava Goel, Brian Kingsbury, Bhuvana Ramabhadran, Steven J. Rennie, and George Saon. Trends and Advances in Speech Recognition.{: external} {{site.data.keyword.IBM_notm}} Journal of Research and Development, Vol. 55(5) (October 2011): pp. 2:1-2:18. {: #picheny2011}
- Saon, George, Zoltan Tueske, Daniel Bolanos, and Brian Kingsbury. Advancing RNN Transducer Technology for Speech Recognition{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (March 2021). {: #saon2021}
- Saon, George, Zoltán Tüsky, and Kartik Audhkhasi. Alignment-Length Synchronous Decoding for RNN Transducer.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May 2020). {: #saon2020}
- Saon, George, Zoltan Tueske, Kartik Audhkhasi, and Brian Kingsbury. Sequence Noise Injected Training for End-to-end Speech Recognition.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May 2019). {: #saon2019}
- Saon, George, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, and Phil Hall. English Conversational Telephone Speech Recognition by Humans and Machines.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 132-136. {: #saon2017}
- Saon, George, Tom Sercu, Steven Rennie, and Hong-Kwang J. Kuo. The {{site.data.keyword.IBM_notm}} 2016 English Conversational Telephone Speech Recognition System.{: external} Submitted to Interspeech 2016 (2016). {: #saon2016}
- Saon, George, Hong-Kwang J. Kuo, Steven Rennie, and Michael Picheny. The {{site.data.keyword.IBM_notm}} 2015 English Conversational Telephone Speech Recognition System.{: external} Submitted to Interspeech 2015 (2015). {: #saon2015}
- Soltau, Hagen, George Saon, and Tara N. Sainath. Joint Training of Convolutional and Non-Convolutional Neural Networks.{: external} Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence, Italy (May 2014): pp. 5572-5576. {: #soltau2014}
- Suzuki, Masayuki, Nobuyasu Itoh, Tohru Nagano, Gakuto Kurata, and Samuel Thomas. Improvements to N-gram Language Model Using Text Generated from Neural Language Model.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May 2019). {: #suzuki2019}
- Thomas, Samuel, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, and Michael Picheny. English Broadcast News Speech Recognition by Humans and Machines.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019). {: #thomas2019}
- Kartik Audhkhasi, George Saon, Zoltán Tüsky, Brian Kingsbury and Michael Picheny. Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.{: external} Proceedings of Interspeech 2019 (September 2019): pp. 2618-2622. {: #audhkhasi2019}