Skip to content

aalramadan/TransVisio

Repository files navigation

TransVisio

image

💡What does it do?

TransVisio provides translations for multiple languages using Large Language Models (LLMs). It takes inputs in various formats (e.g., subtitle file or a video) and it extracts the text

Models Supported

  • GPT 4o
  • GPT 4 Turbo
  • GPT 3 Turbo
  • Gemini 1.5 Pro
  • Gemini 1.5 Flash
  • Whisper 20231117 (Online)
  • Faster-Whisper v1.0.3 (Offline)

Features

  • Inputs supported:
Subtitle files (*.srt *.ass *.ssa). 
Video files (*.mp4 *.mkv *.webm *.flv *.avi *.mov *.wmv *.m4v).
Audio files (*.wav *.ogg *.mp3 *.aac *.flac *.m4a *.oga *.opus). 
Excel files (*.xlsx *.csv).
  • You can save video/audio transcription to Excel.

  • You can specify the number of input sentences.

  • You can pause/resume translation at any point.

  • You can reverse the direction of the translated output.

  • You can remove and/or edit the translated output and the input. The tool will automatically align the rows.

  • Ability to specify Start Time and Duration for video/audio inputs.

  • Provides a Temperature setting, which Controls the randomness of the model’s output. A lower value makes the output more deterministic and focused, while a higher value makes the output more diverse and creative.

  • Light and Dark themes.

Note:
Make sure that you specify the Start Time and Duration before selecting the video/audio input.
Online Whisper requires an API key and is limited to 25 MB input size.
Offline Whisper does not require a key, but must download a model (e.g., tiny, small, etc.) on the first use.

Demo

Animation

Disclaimer

TransVisio is part of a collaborative research funded by the Abdul Hameed Shoman Foundation (Agreement Number: 230800351).
Hosting Institution: The project is hosted by the English Language and Translation Department at the Applied Science Private University.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published