Skip to content

Latest commit

 

History

History
83 lines (56 loc) · 3.39 KB

README.md

File metadata and controls

83 lines (56 loc) · 3.39 KB

Conversational Robot

Robotics Club Summer Project 2020

Aim

The aim of this project was to make a Talking bot, one which can pay attention to the user's voice and generate meaningful and contextual responses according to their intent, much like human conversations.

Ideation

This project was divided into overall three parts :

Overall Pipeline of the Project

overall pipeline

Speech Recognition

A Deep Speech 2 like architecture had been made for this purpose. Eventually we used google-speech-to-text (gstt) API for the conversion of speech to text transcripts with a WER(Word Error Rate) of 4.7%.

Response Generation

The second step in our pipeline is generating conversational responses after we have recognised input speech content. We tried two distinct response generation models trained on a subset of OpenSubtitles Dataset.

  1. Seq2Seq with Message Attention
  2. Topic Aware Seq2Seq with Message Attention

Text to speech conversion

We used the google-text-to-speech (gtts) API for the conversion of text transcripts of responses back to speech.
The API uses pyglet to play a temporary mp3 file created from the Response Generator's textual response.

Installation

Install the required dependencies :

$ cd ConversationalRobot/integration
$ pip install -r requirements.txt

Download the model weights and parameters from here.

Usage

usage: eval_script.py

The bot starts up and begins accepting speech input.

Documentation

Here's a documentation of the project.

Demonstration

Here's a video demonstrating the functioning of the bot as well as the use of a GUI in tkinter.

References

  1. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

  2. Topic Aware Neural Response Generation

    • Link : [https://arxiv.org/abs/1606.08340]
    • Authors : Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, Wei-Ying Ma
    • Tags : Neural response generation; Sequence to sequence model; Topic aware conversation model; Joint attention; Biased response generation
    • Published : 21 Jun 2016 (v1), 19 Sep 2016 (v2)
  3. Topic Modelling and Event Identification from Twitter Textual Data

    • Link : [https://arxiv.org/abs/1608.02519]
    • Authors : Marina Sokolova, Kanyi Huang, Stan Matwin, Joshua Ramisch, Vera Sazonova, Renee Black, Chris Orwa, Sidney Ochieng, Nanjira Sambuli
    • Tags : Latent Dirichlet Allocation; Topic Models; Statistical machine translation
    • Published : 8 Aug 2016
  4. OpenSubtitles (Dataset)