COPYRIGHT(C) 2020 - Transportation, Bots, and Disability Lab - CMU
Code released under MIT.
Contact - Zhi - [email protected]
A collection of ROS Packages that handles audio processing from capture to recognition (Utterance). The collection consist of the following packages:
This repository consist of ROS Messages used throughout the collections
Currently this is a republish of audio signal from audio_capture with our own message (tbd_audio_msgs/AudioStamped
) which encodes the same data but adds additional information about originating time in the header.
This package is a wrapper for WebRTCVADPy which conducts voice activity detection on the received stamped audio
This package is a wrapper for Mozilla's open source implementation of DeepSpeech. It takes in both the VAD and Stamped audio and publishes a detected utterances.
This package is a wrapper for Amazon's AWS Transcribe service. It takes in both the VAD and Stamped audio and publishes a detected utterances.
- Install ROS Melodic.
- Install these ROS dependencies:
sudo apt install ros-melodic-audio-common* sudo apt install ros-melodic-audio-capture*
- Install Python 3 dependencies:
sudo apt install python3-venv
- Create a new ros workspace and python3 virtual environment.
mkdir catkin_ws && cd catkin_ws python3 -m venv venv source vevn/bin/activate
- Install the following python3 dependencies into the virtual environment:
pip install webrtcvad deepspeech==0.7.4 rospkg empy alloylib
- Create and navigate to the
src
directory.mkdir src && cd src
- Clone the
tbd_audio_stack
repo intosrc
.git clone https://github.com/CMU-TBD/tbd_audio_stack.git
- Download the correct deepspeech model files.
cd src/tbd_audio_stack/tbd_audio_recognition_deepspeech && mkdir models && cd models wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.scorer
- Go back to the workspaces's root directory and build and run your project. Make sure to be in the python3 virtual environment.
cd ~/<path_to_your_workspace>/catkin_ws catkin build -DPYTHON_VERSION=3 source devel/setup.bash roslaunch tbd_audio_recognition_deepspeech run_recognition.launch
- Every thing sould run correctly, and you should be able to see the text output by running
rostopic echo /utterance
and speaking into your computers microphone.