-
-
Notifications
You must be signed in to change notification settings - Fork 17
Home
Welcome to the Synthalingua wiki! Here you'll find detailed information on how to use and troubleshoot Synthalingua, a powerful AI-powered real-time audio translation tool.
Table of Contents
Synthalingua requires a system that meets the following minimum requirements:
Requirement | Minimum | Moderate | Recommended | Best Performance |
---|---|---|---|---|
CPU Cores | 2 | 6 | 8 | 16 |
CPU Clock Speed (GHz) | 2.5 or higher | 3.0 or higher | 3.5 or higher | 4.0 or higher |
RAM (GB) | 4 or higher | 8 or higher | 16 or higher | 16 or higher |
GPU VRAM (GB) | 2 or higher | 6 or higher | 8 or higher | 12 or higher |
Free Disk Space (GB) | 10 or higher | 10 or higher | 10 or higher | 10 or higher |
GPU (suggested) | Nvidia GTX 1050 or higher | Nvidia GTX 1660 or higher | Nvidia RTX 3070 or higher | Nvidia RTX 3090 or higher |
Notes:
- Nvidia GPU support on Linux and Windows
- Nvidia GPU is suggested but not required.
- AMD GPUs are supported on Linux, not Windows.
- A microphone is optional. You can use the
--stream
flag to stream audio from a HLS stream.
- Install Python: Download and install Python 3.10.9. Ensure you select the "Add Python to PATH" option during installation.
- Install Git: Download and install Git. Using default settings is recommended.
- Install FFMPEG: Follow the instructions provided here to install FFMPEG.
- Install CUDA (Optional): If you plan to utilize your Nvidia GPU, download and install CUDA from here.
-
Run Setup Script:
-
On Windows: Execute the
setup.bat
file. -
On Linux: Execute the
setup.bash
file. Ensure you havegcc
andportaudio19-dev
(orportaudio-devel
for some systems) installed.
-
On Windows: Execute the
- Run Synthalingua: Execute the newly created batch file or bash script. You can modify this file to customize the settings.
Synthalingua utilizes command line arguments to configure its behavior. Below is a table detailing the available arguments:
Flag | Description |
---|---|
--ram |
Specify the amount of RAM to allocate. Default: 4GB. Options: "1GB", "2GB", "4GB", "6GB", "12GB". |
--ramforce |
Force the script to use the specified VRAM amount. Caution: May lead to crashes if insufficient VRAM is available. |
--energy_threshold |
Set the microphone's audio detection sensitivity. Default: 100. Range: 1-1000 (higher values decrease sensitivity). |
--mic_calibration_time |
Duration in seconds for microphone calibration. Set to 0 to skip user input and use the default 5 seconds. |
--record_timeout |
Real-time recording duration in seconds. Default: 2 seconds. |
--phrase_timeout |
Silence duration in seconds between recordings before considering it a new line. Default: 1 second. |
--translate |
Enable translation of transcriptions to English. |
--transcribe |
Enable transcription of audio to the specified target language. Requires the --target_language flag. |
--target_language |
Specify the target language for translation or transcription. Use ISO 639-1 language codes or their English names. |
--language |
Specify the source language for translation. Use ISO 639-1 language codes or their English names. |
--auto_model_swap |
Enable automatic model switching based on the detected language. |
--device |
Select the processing unit for the model. Default: "cuda" (if available). Options: "cpu", "cuda". |
--cuda_device |
Specify the CUDA device ID to utilize. Default: 0. |
--discord_webhook |
Set the Discord webhook URL to receive transcriptions. |
--list_microphones |
Display a list of available microphones and exit. |
--set_microphone |
Set the default microphone using its name or ID from the list generated by --list_microphones . |
--microphone_enabled |
Enable or disable microphone usage. Use true or false after the flag. |
--auto_language_lock |
Automatically lock the language after 5 detections based on the detected language. Improves latency. |
--use_finetune |
Utilize the fine-tuned model for increased accuracy (at the cost of higher latency and resource usage). |
--no_log |
Display only the most recent translation/transcription instead of a log-style output. |
--updatebranch |
Specify the repository branch to check for updates. Default: "master". Options: "master", "dev-testing", "bleeding-under-work", "disable". |
--keep_temp |
Retain audio files in the "out" folder. Note: This will consume storage space over time. |
--portnumber |
Set the port number for the web server. If not specified, the web server will not start. |
--retry |
Enable retrying translations and transcriptions in case of failures. |
--about |
Display information about the application. |
--save_transcript |
Enable saving the transcript to a text file. |
--save_folder |
Specify the folder to save the transcript to. |
--stream |
Stream audio from a specified HLS stream URL. |
--stream_language |
Specify the language of the audio stream. Default: English. |
--stream_target_language |
Specify the target language for stream translation or transcription. Default: English. |
--stream_translate |
Enable translation of the audio stream. |
--stream_transcribe |
Enable transcription of the audio stream to the specified target language. |
--stream_original_text |
Display the detected original text from the stream. |
--stream_chunks |
Specify the number of chunks to split the stream into. Default: 5 (recommended range: 3-5 for most streams, 1-2 for YouTube, 5-10 for Twitch). |
--cookies |
Specify the filename of the cookies file (without extension) located in the "cookies" folder. |
--makecaptions |
Enable caption generation mode. Requires --file_input , --file_output , and --file_output_name flags. |
--file_input |
Specify the path to the input audio/video file for caption generation. |
--file_output |
Specify the folder to save the generated captions to. |
--file_output_name |
Specify the filename for the generated captions (without extension). |
--ignorelist |
Specify the path to a text file containing a list of words or phrases to ignore. |
--condition_on_previous_text |
Enable conditioning the model on previous text to reduce repetition (may impact speed). |
--remote_hls_password_id |
Specify the password ID for accessing password-protected HLS streams. Default: "key". |
--remote_hls_password |
Specify the password for accessing password-protected HLS streams. |
Caption Generation:
python transcribe_audio.py --ram 12gb --makecaptions --file_input="C:\Users\username\Downloads\video.mp4" --file_output="C:\Users\username\Downloads" --file_output_name="captions" --language Japanese --device cuda
Live Stream Translation:
python transcribe_audio.py --ram 12gb --stream_translate --stream_language Japanese --stream https://www.twitch.tv/somestreamerhere
Discord Integration:
python transcribe_audio.py --ram 6gb --translate --language ja --discord_webhook "https://discord.com/api/webhooks/1234567890/1234567890" --energy_threshold 300
Setting Microphone:
- List microphones:
python transcribe_audio.py --list_microphones
- Set microphone:
python transcribe_audio.py --set_microphone "Microphone Name"
orpython transcribe_audio.py --set_microphone 2
(using index)
Start the web server using the --portnumber
flag:
python transcribe_audio.py --portnumber 4000
Access the web interface at http://localhost:4000
. Use query parameters to control element visibility:
-
?showoriginal
: Show original detected text. -
?showtranslation
: Show translated text. -
?showtranscription
: Show transcribed text.
Use the --ignorelist
flag to specify a text file containing words or phrases to exclude from the output:
python transcribe_audio.py --ignorelist "C:\path\to\wordlist.txt"
Place cookie files in the "cookies" folder in Netscape format (.txt
). Use the --cookies
flag to specify the filename without the extension:
python transcribe_audio.py --cookies twitchacc1
Refer to the Troubleshooting section in the main README for solutions to common issues.
- Models: Synthalingua utilizes fine-tuned models based on OpenAI's Whisper.
- Support: For assistance or to report issues, please create an issue on the GitHub repository.
We welcome contributions to Synthalingua! Please refer to the Contribution Guidelines for information on how to contribute.