Linux python project to:
- recognize human speech (German or English), from either a microphone or a video
- then translate it to English or German
- then convert it into speech (text-to-speech).
Tested on Ubuntu 22.04.1 LTS with Python 3.10.4 and pip 22.2.2
- Clone and change to the repository and
bash install.sh
- Confirm the installation of the programs it needs
- Activate the virtual environment
source ~/venv_speech_pipeline/bin/activate
All machine learning models will automatically be downloaded the first time they are needed:
- Vosk models in
~/.cache/vosk/
(more than 1 GB each) - Marian models in working/git directory
- TTS models in
~/.local/share/tts/
Run python3 process_speech {mic,video} --help
for more information
Run python3 process_speech.py video [file]
Run python3 process_speech.py mic