Home

Wiki is Work in progress! There will be errors and details I missed or did errors with.

Synthalingua Wiki

Welcome to the Synthalingua wiki! Here you'll find detailed information on how to use and troubleshoot Synthalingua, a powerful AI-powered real-time audio translation tool.

Table of Contents

Getting Started
- System Requirements
- Installation
Usage
Troubleshooting
Additional Information
Contributing

Getting Started

System Requirements

Synthalingua requires a system that meets the following minimum requirements:

Requirement	Minimum	Moderate	Recommended	Best Performance
CPU Cores	2	6	8	16
CPU Clock Speed (GHz)	2.5 or higher	3.0 or higher	3.5 or higher	4.0 or higher
RAM (GB)	4 or higher	8 or higher	16 or higher	16 or higher
GPU VRAM (GB)	2 or higher	6 or higher	8 or higher	12 or higher
Free Disk Space (GB)	10 or higher	10 or higher	10 or higher	10 or higher
GPU (suggested)	Nvidia GTX 1050 or higher	Nvidia GTX 1660 or higher	Nvidia RTX 3070 or higher	Nvidia RTX 3090 or higher

Notes:

Nvidia GPU support on Linux and Windows
Nvidia GPU is suggested but not required.
AMD GPUs are supported on Linux, not Windows.
A microphone is optional. You can use the --stream flag to stream audio from a HLS stream.

Installation

Install Python: Download and install Python 3.10.9. Ensure you select the "Add Python to PATH" option during installation.
Install Git: Download and install Git. Using default settings is recommended.
Install FFMPEG: Follow the instructions provided here to install FFMPEG.
Install CUDA (Optional): If you plan to utilize your Nvidia GPU, download and install CUDA from here.
Run Setup Script:
- On Windows: Execute the setup.bat file.
- On Linux: Execute the setup.bash file. Ensure you have gcc and portaudio19-dev (or portaudio-devel for some systems) installed.
Run Synthalingua: Execute the newly created batch file or bash script. You can modify this file to customize the settings.

Usage

Command Line Arguments

Synthalingua utilizes command line arguments to configure its behavior. Below is a table detailing the available arguments:

Flag	Description
`--ram`	Change the amount of RAM to use. Default is 4GB. Choices are "1GB", "2GB", "4GB", "6GB", "12GB".
`--ramforce`	Use this flag to force the script to use desired VRAM. May cause the script to crash if there is not enough VRAM available.
`--energy_threshold`	Set the energy level for microphone to detect. Default is 100. Choose from 1 to 1000; anything higher will be harder to trigger the audio detection.
`--mic_calibration_time`	How long to calibrate the mic for in seconds. To skip user input type 0 and time will be set to 5 seconds.
`--record_timeout`	Set the time in seconds for real-time recording. Default is 2 seconds.
`--phrase_timeout`	Set the time in seconds for empty space between recordings before considering it a new line in the transcription. Default is 1 second.
`--translate`	Translate the transcriptions to English. Enables translation.
`--transcribe`	Transcribe the audio to a set target language. Target Language flag is required.
`--target_language`	Select the language to translate to. Available choices are a list of languages in ISO 639-1 format, as well as their English names.
`--language`	Select the language to translate from. Available choices are a list of languages in ISO 639-1 format, as well as their English names.
`--auto_model_swap`	Automatically swap the model based on the detected language. Enables automatic model swapping.
`--device`	Select the device to use for the model. Default is "cuda" if available. Available options are "cpu" and "cuda". When setting to CPU you can choose any RAM size as long as you have enough RAM. The CPU option is optimized for multi-threading, so if you have like 16 cores, 32 threads, you can see good results.
`--cuda_device`	Select the CUDA device to use for the model. Default is 0.
`--discord_webhook`	Set the Discord webhook to send the transcription to.
`--list_microphones`	List available microphones and exit.
`--set_microphone`	Set the default microphone to use. You can set the name or its ID number from the list.
`--microphone_enabled`	Enables microphone usage. Add `true` after the flag.
`--auto_language_lock`	Automatically lock the language based on the detected language after 5 detections. Enables automatic language locking. Will help reduce latency. Use this flag if you are using non-English and if you do not know the current spoken language.
`--model_dir`	Default location is "model" folder. You can use this argument to change location.
`--use_finetune`	Use fine-tuned model. This will increase accuracy, but will also increase latency. Additional VRAM/RAM usage is required.
`--no_log`	Makes it so only the last thing translated/transcribed is shown rather log style list.
`--updatebranch`	Check which branch from the repo to check for updates. Default is master, choices are master and dev-testing and bleeding-under-work. To turn off update checks use disable. bleeding-under-work is basically latest changes and can break at any time.
`--keep_temp`	Keeps audio files in the out folder. This will take up space over time though.
`--portnumber`	Set the port number for the web server. If no number is set then the web server will not start.
`--retry`	Retries translations and transcription if they fail.
`--about`	Shows about the app.
`--save_transcript`	Saves the transcript to a text file.
`--save_folder`	Set the folder to save the transcript to.
`--stream`	Stream audio from a HLS stream.
`--stream_language`	Language of the stream. Default is English.
`--stream_target_language`	Language to translate the stream to. Default is English. Needed for `--stream_transcribe`
`--stream_translate`	Translate the stream.
`--stream_transcribe`	Transcribe the stream to different language. Use `--stream_target_language` to change the output.
`--stream_original_text`	Show the detected original text.
`--stream_chunks`	How many chunks to split the stream into. Default is 5 is recommended to be between 3 and 5. YouTube streams should be 1 or 2, twitch should be 5 to 10. The higher the number, the more accurate, but also the slower and delayed the stream translation and transcription will be.
`--cookies`	Cookies file name, just like twitch, youtube, twitchacc1, twitchacczed
`--makecaptions`	Set program to captions mode, requires file_input, file_output, file_output_name
`--file_input`	Location of file for the input to make captions for, almost all video/audio format supported (uses ffmpeg)
`--file_output`	Location of folder to export the captions
`--file_output_name`	File name to export as without any ext.
`--ignorelist`	Usage is "`--ignorelist "C:\quoted\path\to\wordlist.txt"`"
`--condition_on_previous_text`	Will help the model from repeating itself, but may slow up the process.
`--remote_hls_password_id`	Password ID for the webserver. Usually like 'id', or 'key'. Key is default for the program though, so when it asks for id/password, Synthalingua will be `key=000000` - `key`=`id` - `0000000`=`password` 16 chars long.
`--remote_hls_password`	Password for the hls webserver.

Examples

Caption Generation:

python transcribe_audio.py --ram 12gb --makecaptions --file_input="C:\Users\username\Downloads\video.mp4" --file_output="C:\Users\username\Downloads" --file_output_name="captions" --language Japanese --device cuda

Live Stream Translation:

python transcribe_audio.py --ram 12gb --stream_translate --stream_language Japanese --stream https://www.twitch.tv/somestreamerhere

Discord Integration:

python transcribe_audio.py --ram 6gb --translate --language ja --discord_webhook "https://discord.com/api/webhooks/1234567890/1234567890" --energy_threshold 300

Setting Microphone:

List microphones: python transcribe_audio.py --list_microphones
Set microphone: python transcribe_audio.py --set_microphone "Microphone Name" or python transcribe_audio.py --set_microphone 2 (using index)

Web Server

Start the web server using the --portnumber flag:

python transcribe_audio.py --portnumber 4000

Access the web interface at http://localhost:4000. Use query parameters to control element visibility:

?showoriginal: Show original detected text.
?showtranslation: Show translated text.
?showtranscription: Show transcribed text.

Word Block List

Use the --ignorelist flag to specify a text file containing words or phrases to exclude from the output:

python transcribe_audio.py --ignorelist "C:\path\to\wordlist.txt"

Cookies

Place cookie files in the "cookies" folder in Netscape format (.txt). Use the --cookies flag to specify the filename without the extension:

python transcribe_audio.py --cookies twitchacc1

Troubleshooting

Refer to the Troubleshooting section in the main README for solutions to common issues.

Additional Information

Models: Synthalingua utilizes fine-tuned models based on OpenAI's Whisper.
Support: For assistance or to report issues, please create an issue on the GitHub repository.

Contributing

We welcome contributions to Synthalingua! Please refer to the Contribution Guidelines for information on how to contribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly