Speech pipeline

Linux python project to:

recognize human speech (German or English), from either a microphone or a video
then translate it to English or German
then convert it into speech (text-to-speech).

Installation

Tested on Ubuntu 22.04.1 LTS with Python 3.10.4 and pip 22.2.2

Clone and change to the repository and bash install.sh
Confirm the installation of the programs it needs
Activate the virtual environment source ~/venv_speech_pipeline/bin/activate

Models

All machine learning models will automatically be downloaded the first time they are needed:

Vosk models in ~/.cache/vosk/ (more than 1 GB each)
Marian models in working/git directory
TTS models in ~/.local/share/tts/

Usage

Run python3 process_speech {mic,video} --help for more information

From a video file

Run python3 process_speech.py video [file]

From a microphone

Run python3 process_speech.py mic

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.gitignore		.gitignore
README.md		README.md
cTTS.py		cTTS.py
install.sh		install.sh
process_speech.py		process_speech.py
tagesschau.mp4		tagesschau.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech pipeline

Installation

Models

Usage

From a video file

From a microphone

About

Releases

Packages

Languages

curious-broccoli/speech_pipeline

Folders and files

Latest commit

History

Repository files navigation

Speech pipeline

Installation

Models

Usage

From a video file

From a microphone

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages