Skip to content
cyberofficial edited this page Sep 23, 2024 · 6 revisions

Wiki is Work in progress! There will be errors and details I missed or did errors with.

Synthalingua Wiki

Welcome to the Synthalingua wiki! Here you'll find detailed information on how to use and troubleshoot Synthalingua, a powerful AI-powered real-time audio translation tool.

Table of Contents

Getting Started

System Requirements

Synthalingua requires a system that meets the following minimum requirements:

Requirement Minimum Moderate Recommended Best Performance
CPU Cores 2 6 8 16
CPU Clock Speed (GHz) 2.5 or higher 3.0 or higher 3.5 or higher 4.0 or higher
RAM (GB) 4 or higher 8 or higher 16 or higher 16 or higher
GPU VRAM (GB) 2 or higher 6 or higher 8 or higher 12 or higher
Free Disk Space (GB) 10 or higher 10 or higher 10 or higher 10 or higher
GPU (suggested) Nvidia GTX 1050 or higher Nvidia GTX 1660 or higher Nvidia RTX 3070 or higher Nvidia RTX 3090 or higher

Notes:

  • Nvidia GPU support on Linux and Windows
  • Nvidia GPU is suggested but not required.
  • AMD GPUs are supported on Linux, not Windows.
  • A microphone is optional. You can use the --stream flag to stream audio from a HLS stream.

Installation

  1. Install Python: Download and install Python 3.10.9. Ensure you select the "Add Python to PATH" option during installation.
  2. Install Git: Download and install Git. Using default settings is recommended.
  3. Install FFMPEG: Follow the instructions provided here to install FFMPEG.
  4. Install CUDA (Optional): If you plan to utilize your Nvidia GPU, download and install CUDA from here.
  5. Run Setup Script:
    • On Windows: Execute the setup.bat file.
    • On Linux: Execute the setup.bash file. Ensure you have gcc and portaudio19-dev (or portaudio-devel for some systems) installed.
  6. Run Synthalingua: Execute the newly created batch file or bash script. You can modify this file to customize the settings.

Usage

Command Line Arguments

Synthalingua utilizes command line arguments to configure its behavior. Below is a table detailing the available arguments:

Flag Description
--ram Change the amount of RAM to use. Default is 4GB. Choices are "1GB", "2GB", "4GB", "6GB", "12GB".
--ramforce Use this flag to force the script to use desired VRAM. May cause the script to crash if there is not enough VRAM available.
--energy_threshold Set the energy level for microphone to detect. Default is 100. Choose from 1 to 1000; anything higher will be harder to trigger the audio detection.
--mic_calibration_time How long to calibrate the mic for in seconds. To skip user input type 0 and time will be set to 5 seconds.
--record_timeout Set the time in seconds for real-time recording. Default is 2 seconds.
--phrase_timeout Set the time in seconds for empty space between recordings before considering it a new line in the transcription. Default is 1 second.
--translate Translate the transcriptions to English. Enables translation.
--transcribe Transcribe the audio to a set target language. Target Language flag is required.
--target_language Select the language to translate to. Available choices are a list of languages in ISO 639-1 format, as well as their English names.
--language Select the language to translate from. Available choices are a list of languages in ISO 639-1 format, as well as their English names.
--auto_model_swap Automatically swap the model based on the detected language. Enables automatic model swapping.
--device Select the device to use for the model. Default is "cuda" if available. Available options are "cpu" and "cuda". When setting to CPU you can choose any RAM size as long as you have enough RAM. The CPU option is optimized for multi-threading, so if you have like 16 cores, 32 threads, you can see good results.
--cuda_device Select the CUDA device to use for the model. Default is 0.
--discord_webhook Set the Discord webhook to send the transcription to.
--list_microphones List available microphones and exit.
--set_microphone Set the default microphone to use. You can set the name or its ID number from the list.
--microphone_enabled Enables microphone usage. Add true after the flag.
--auto_language_lock Automatically lock the language based on the detected language after 5 detections. Enables automatic language locking. Will help reduce latency. Use this flag if you are using non-English and if you do not know the current spoken language.
--model_dir Default location is "model" folder. You can use this argument to change location.
--use_finetune Use fine-tuned model. This will increase accuracy, but will also increase latency. Additional VRAM/RAM usage is required.
--no_log Makes it so only the last thing translated/transcribed is shown rather log style list.
--updatebranch Check which branch from the repo to check for updates. Default is master, choices are master and dev-testing and bleeding-under-work. To turn off update checks use disable. bleeding-under-work is basically latest changes and can break at any time.
--keep_temp Keeps audio files in the out folder. This will take up space over time though.
--portnumber Set the port number for the web server. If no number is set then the web server will not start.
--retry Retries translations and transcription if they fail.
--about Shows about the app.
--save_transcript Saves the transcript to a text file.
--save_folder Set the folder to save the transcript to.
--stream Stream audio from a HLS stream.
--stream_language Language of the stream. Default is English.
--stream_target_language Language to translate the stream to. Default is English. Needed for --stream_transcribe
--stream_translate Translate the stream.
--stream_transcribe Transcribe the stream to different language. Use --stream_target_language to change the output.
--stream_original_text Show the detected original text.
--stream_chunks How many chunks to split the stream into. Default is 5 is recommended to be between 3 and 5. YouTube streams should be 1 or 2, twitch should be 5 to 10. The higher the number, the more accurate, but also the slower and delayed the stream translation and transcription will be.
--cookies Cookies file name, just like twitch, youtube, twitchacc1, twitchacczed
--makecaptions Set program to captions mode, requires file_input, file_output, file_output_name
--file_input Location of file for the input to make captions for, almost all video/audio format supported (uses ffmpeg)
--file_output Location of folder to export the captions
--file_output_name File name to export as without any ext.
--ignorelist Usage is "--ignorelist "C:\quoted\path\to\wordlist.txt""
--condition_on_previous_text Will help the model from repeating itself, but may slow up the process.
--remote_hls_password_id Password ID for the webserver. Usually like 'id', or 'key'. Key is default for the program though, so when it asks for id/password, Synthalingua will be key=000000 - key=id - 0000000=password 16 chars long.
--remote_hls_password Password for the hls webserver.

Examples

Caption Generation:

python transcribe_audio.py --ram 12gb --makecaptions --file_input="C:\Users\username\Downloads\video.mp4" --file_output="C:\Users\username\Downloads" --file_output_name="captions" --language Japanese --device cuda

Live Stream Translation:

python transcribe_audio.py --ram 12gb --stream_translate --stream_language Japanese --stream https://www.twitch.tv/somestreamerhere

Discord Integration:

python transcribe_audio.py --ram 6gb --translate --language ja --discord_webhook "https://discord.com/api/webhooks/1234567890/1234567890" --energy_threshold 300

Setting Microphone:

  1. List microphones: python transcribe_audio.py --list_microphones
  2. Set microphone: python transcribe_audio.py --set_microphone "Microphone Name" or python transcribe_audio.py --set_microphone 2 (using index)

Web Server

Start the web server using the --portnumber flag:

python transcribe_audio.py --portnumber 4000

Access the web interface at http://localhost:4000. Use query parameters to control element visibility:

  • ?showoriginal: Show original detected text.
  • ?showtranslation: Show translated text.
  • ?showtranscription: Show transcribed text.

Word Block List

Use the --ignorelist flag to specify a text file containing words or phrases to exclude from the output:

python transcribe_audio.py --ignorelist "C:\path\to\wordlist.txt"

Cookies

Place cookie files in the "cookies" folder in Netscape format (.txt). Use the --cookies flag to specify the filename without the extension:

python transcribe_audio.py --cookies twitchacc1

Troubleshooting

Refer to the Troubleshooting section in the main README for solutions to common issues.

Additional Information

  • Models: Synthalingua utilizes fine-tuned models based on OpenAI's Whisper.
  • Support: For assistance or to report issues, please create an issue on the GitHub repository.

Contributing

We welcome contributions to Synthalingua! Please refer to the Contribution Guidelines for information on how to contribute.