Skip to content

Latest commit

 

History

History
132 lines (100 loc) · 7.03 KB

README.md

File metadata and controls

132 lines (100 loc) · 7.03 KB
Awesome Tatar

Awesome Tatar

en

About

A curated list of awesome libraries, resources, services and datasets for Tatar language.

Table of Contents

LLMs

  • tweety-tatar-base - LLM for the Tatar language, converted from the Mistral-7B-Instruct-v0.2 model trained by MistralAI
  • mGPT-1.3B-tatar - The model derived from the base mGPT-XL (1.3B) model which was originally trained on the 61 languages from 25 language families using Wikipedia and C4 corpus by SberAI.

Parallel corpora

Monocorpora

Audio datasets

Other datasets

  • SART - datasets of Similarity, Analogies, and Relatedness for Tatar language.

Cyrill-latin convertors

Text-to-speech & speech-to-text

  • MMS-1b-tatar - Fine-tuned ASR for tatar language.
  • speech.tatar - Read aloud service powered by Institute of Applied Semiotics.
  • Tatsoft ASR - API for automatic speech recognition system for Tatar language provided by Tatsoft.
  • Tatsoft TTS - API for text-to-speech synthesis system for Tatar language provided by Tatsoft.
  • TatarSCR - An open-source Tatar Speech Commands Dataset
  • Silero Models - Pre-trained STT / TTS models with tatar language support. Minimal working example can be found here.
  • Massively Multilingual Speech - Open-source STT / TTS initiative for thousands of languages.
  • TurkicTTS - A multilingual text-to-speech synthesis system for 10 turkic languages.
  • RHVoice - A free and open source speech synthesizer with tatar language support.

Language corpus

Language analyzers

Volunteer localization projects

Localization guides

Browser's plugins

  • tatarspeech(beta) - real-time YouTube video translation to Tatar.