This is an end-to-end voicebot that aims to answer open domain questions, and is intended to be used as a benchmarking tool
- python 3.6
- pytorch (1.1.0)
- tensorflow (1.12)
- wikipedia (1.14)
- deepspeech (0.5.0)
- spacy (2.1.5)
- gingerit (0.8.0)
- pytorch-pretrained-bert (0.6.2)
- playsound (1.2.2)
- sounddevice (0.3.13)
- soundfile (0.10.2)
- inflect (2.1)
- librosa (0.7.0)
- matplotlib (3.1.1)
- unidecode (1.1.1)
- numpy (1.17.0)
We recommend using a virtual environment to run this to prevent any conflicts with things like numpy.
You can install any of the Spacy NER models you prefer (We used 'en_core_web_md') by:
- python -m spacy download en_core_web_md (Note: Run this in an elevated command prompt with Admin permissions)
You will also require the following models
An info.txt file is located in every directory where a specific model is required. Extract the contents of the models and place them in their respective folders in the project. (BERT, DeepSpeech/Models and Tacotron_TTS/tacotron-models-data folders respectively. WaveRNN should be extracted under the Vocoder_WaveRNN folder)
Open domain QA will also require an internet connection, to get information from Wikipedia.
Run the Voicebot file to start the application. You will be prompted to select the TTS system of your choice after the other models have loaded.
The WaveRNN + Tacotron is very resource heavy and produces poor results when run on systems with 8GB of RAM. The speech produced is a lot more natural sounding but often have garbage audio produced towards the end. The standalone tacotron is much lighter, and will not have as poor results on systems with lower resources
Once the TTS has been loaded you will be prompted to select the running mode. This will let you choose between a microphone for input audio, or allow you to use a folder of audio files to test. To add your own audio to the testing set, simply place the wav file in the test-audio folder. For best results, use an American male voice, with a normal or slow speed setting from a site like this.
Run the VoiceBot-windows.py file. Outputs can be accessed from '/Vocoder_WaveRNN/WaveRNN_outputs' OR '/Tacotron_TTS/Tacotron_outputs' subfolders
Run the VoiceBot-linux.py file.
Note : The playsound library and sounddevice library are not compatible with Ubuntu, so audio cannot be recorded from or played on the console. VoiceBot can work only from questions pre-recorded in 'test_audio' folder. Outputs can be accessed from '/Vocoder_WaveRNN/WaveRNN_outputs' OR '/Tacotron_TTS/Tacotron_outputs' subfolders
- Mozilla Deepspeech
- Keith Ito's Tacotron implementation
- Hugging Face's BERT for QA implementation
- Fatchord's WaveRNN
- BERT model trained by Surbhi Bhardwaj
Link to demo video here: https://drive.google.com/file/d/16pFeDjqDOCkVXW0cc09l_mkuxqgQjo8s/view?usp=drive_web