-
-
Notifications
You must be signed in to change notification settings - Fork 17
Home
Welcome to the Synthalingua wiki! Here you'll find detailed information on how to use and troubleshoot Synthalingua, a powerful AI-powered real-time audio translation tool.
Table of Contents
Synthalingua requires a system that meets the following minimum requirements:
Requirement | Minimum | Moderate | Recommended | Best Performance |
---|---|---|---|---|
CPU Cores | 2 | 6 | 8 | 16 |
CPU Clock Speed (GHz) | 2.5 or higher | 3.0 or higher | 3.5 or higher | 4.0 or higher |
RAM (GB) | 4 or higher | 8 or higher | 16 or higher | 16 or higher |
GPU VRAM (GB) | 2 or higher | 6 or higher | 8 or higher | 12 or higher |
Free Disk Space (GB) | 10 or higher | 10 or higher | 10 or higher | 10 or higher |
GPU (suggested) | Nvidia GTX 1050 or higher | Nvidia GTX 1660 or higher | Nvidia RTX 3070 or higher | Nvidia RTX 3090 or higher |
Notes:
- Nvidia GPU support on Linux and Windows
- Nvidia GPU is suggested but not required.
- AMD GPUs are supported on Linux, not Windows.
- A microphone is optional. You can use the
--stream
flag to stream audio from a HLS stream.
- Install Python: Download and install Python 3.10.9. Ensure you select the "Add Python to PATH" option during installation.
- Install Git: Download and install Git. Using default settings is recommended.
- Install FFMPEG: Follow the instructions provided here to install FFMPEG.
- Install CUDA (Optional): If you plan to utilize your Nvidia GPU, download and install CUDA from here.
-
Run Setup Script:
-
On Windows: Execute the
setup.bat
file. -
On Linux: Execute the
setup.bash
file. Ensure you havegcc
andportaudio19-dev
(orportaudio-devel
for some systems) installed.
-
On Windows: Execute the
- Run Synthalingua: Execute the newly created batch file or bash script. You can modify this file to customize the settings.
Synthalingua utilizes command line arguments to configure its behavior. Below is a table detailing the available arguments:
Flag | Description |
---|---|
--ram |
Change the amount of RAM to use. Default is 4GB. Choices are "1GB", "2GB", "4GB", "6GB", "12GB". |
--ramforce |
Use this flag to force the script to use desired VRAM. May cause the script to crash if there is not enough VRAM available. |
--energy_threshold |
Set the energy level for microphone to detect. Default is 100. Choose from 1 to 1000; anything higher will be harder to trigger the audio detection. |
--mic_calibration_time |
How long to calibrate the mic for in seconds. To skip user input type 0 and time will be set to 5 seconds. |
--record_timeout |
Set the time in seconds for real-time recording. Default is 2 seconds. |
--phrase_timeout |
Set the time in seconds for empty space between recordings before considering it a new line in the transcription. Default is 1 second. |
--translate |
Translate the transcriptions to English. Enables translation. |
--transcribe |
Transcribe the audio to a set target language. Target Language flag is required. |
--target_language |
Select the language to translate to. Available choices are a list of languages in ISO 639-1 format, as well as their English names. |
--language |
Select the language to translate from. Available choices are a list of languages in ISO 639-1 format, as well as their English names. |
--auto_model_swap |
Automatically swap the model based on the detected language. Enables automatic model swapping. |
--device |
Select the device to use for the model. Default is "cuda" if available. Available options are "cpu" and "cuda". When setting to CPU you can choose any RAM size as long as you have enough RAM. The CPU option is optimized for multi-threading, so if you have like 16 cores, 32 threads, you can see good results. |
--cuda_device |
Select the CUDA device to use for the model. Default is 0. |
--discord_webhook |
Set the Discord webhook to send the transcription to. |
--list_microphones |
List available microphones and exit. |
--set_microphone |
Set the default microphone to use. You can set the name or its ID number from the list. |
--microphone_enabled |
Enables microphone usage. Add true after the flag. |
--auto_language_lock |
Automatically lock the language based on the detected language after 5 detections. Enables automatic language locking. Will help reduce latency. Use this flag if you are using non-English and if you do not know the current spoken language. |
--model_dir |
Default location is "model" folder. You can use this argument to change location. |
--use_finetune |
Use fine-tuned model. This will increase accuracy, but will also increase latency. Additional VRAM/RAM usage is required. |
--no_log |
Makes it so only the last thing translated/transcribed is shown rather log style list. |
--updatebranch |
Check which branch from the repo to check for updates. Default is master, choices are master and dev-testing and bleeding-under-work. To turn off update checks use disable. bleeding-under-work is basically latest changes and can break at any time. |
--keep_temp |
Keeps audio files in the out folder. This will take up space over time though. |
--portnumber |
Set the port number for the web server. If no number is set then the web server will not start. |
--retry |
Retries translations and transcription if they fail. |
--about |
Shows about the app. |
--save_transcript |
Saves the transcript to a text file. |
--save_folder |
Set the folder to save the transcript to. |
--stream |
Stream audio from a HLS stream. |
--stream_language |
Language of the stream. Default is English. |
--stream_target_language |
Language to translate the stream to. Default is English. Needed for --stream_transcribe
|
--stream_translate |
Translate the stream. |
--stream_transcribe |
Transcribe the stream to different language. Use --stream_target_language to change the output. |
--stream_original_text |
Show the detected original text. |
--stream_chunks |
How many chunks to split the stream into. Default is 5 is recommended to be between 3 and 5. YouTube streams should be 1 or 2, twitch should be 5 to 10. The higher the number, the more accurate, but also the slower and delayed the stream translation and transcription will be. |
--cookies |
Cookies file name, just like twitch, youtube, twitchacc1, twitchacczed |
--makecaptions |
Set program to captions mode, requires file_input, file_output, file_output_name |
--file_input |
Location of file for the input to make captions for, almost all video/audio format supported (uses ffmpeg) |
--file_output |
Location of folder to export the captions |
--file_output_name |
File name to export as without any ext. |
--ignorelist |
Usage is "--ignorelist "C:\quoted\path\to\wordlist.txt" " |
--condition_on_previous_text |
Will help the model from repeating itself, but may slow up the process. |
--remote_hls_password_id |
Password ID for the webserver. Usually like 'id', or 'key'. Key is default for the program though, so when it asks for id/password, Synthalingua will be key=000000 - key =id - 0000000 =password 16 chars long. |
--remote_hls_password |
Password for the hls webserver. |
Caption Generation:
python transcribe_audio.py --ram 12gb --makecaptions --file_input="C:\Users\username\Downloads\video.mp4" --file_output="C:\Users\username\Downloads" --file_output_name="captions" --language Japanese --device cuda
Live Stream Translation:
python transcribe_audio.py --ram 12gb --stream_translate --stream_language Japanese --stream https://www.twitch.tv/somestreamerhere
Discord Integration:
python transcribe_audio.py --ram 6gb --translate --language ja --discord_webhook "https://discord.com/api/webhooks/1234567890/1234567890" --energy_threshold 300
Setting Microphone:
- List microphones:
python transcribe_audio.py --list_microphones
- Set microphone:
python transcribe_audio.py --set_microphone "Microphone Name"
orpython transcribe_audio.py --set_microphone 2
(using index)
Start the web server using the --portnumber
flag:
python transcribe_audio.py --portnumber 4000
Access the web interface at http://localhost:4000
. Use query parameters to control element visibility:
-
?showoriginal
: Show original detected text. -
?showtranslation
: Show translated text. -
?showtranscription
: Show transcribed text.
Use the --ignorelist
flag to specify a text file containing words or phrases to exclude from the output:
python transcribe_audio.py --ignorelist "C:\path\to\wordlist.txt"
Place cookie files in the "cookies" folder in Netscape format (.txt
). Use the --cookies
flag to specify the filename without the extension:
python transcribe_audio.py --cookies twitchacc1
Refer to the Troubleshooting section in the main README for solutions to common issues.
- Models: Synthalingua utilizes fine-tuned models based on OpenAI's Whisper.
- Support: For assistance or to report issues, please create an issue on the GitHub repository.
We welcome contributions to Synthalingua! Please refer to the Contribution Guidelines for information on how to contribute.