Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual support #72

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

Kostis-S-Z
Copy link
Contributor

@Kostis-S-Z Kostis-S-Z commented Dec 18, 2024

What's changing

The overall goal of this PR is to add support for more languages.

In the process, the following changes / additions were made:

  • Add unified TTS model interface to hide away complexity of the different tts models: TTSModel
  • Create registry for tts model loading functions: TTS_LOADERS
  • Create registry for tts model inference functions: TTS_INFERENCE
  • Validate config.yaml's text_to_speech_model with validate_text_to_speech_model
  • Re-write test_load_tts_model to use parametrization

Closes #29

How to test it

Steps to test the changes:

  1. Clone the repository: git clone https://github.com/Kostis-S-Z/document-to-podcast.git
  2. Move to the directory: cd document-to-podcast
  3. Change branch: git checkout multilingual-support
  4. Install the package: pip install -e .
  5. Pick a model and languages from the list below (under Model IDs / Languages tested:)
  6. Edit example_data/config.yaml (or create a copy) and change
  • input_file: Use a file that has text in a language of your choice
  • text_to_speech_model: Use one of the model ids, defined below, based on the language you are testing
  • text_to_text_prompt: Re-write it / Translate it in the testing language
  • speaker/description: Re-write it / Translate it in the testing language
  • voice_profile: Use one of the pre-defined profiles based on the testing language, from the list below
  1. document-to-podcast --from_config example_data/config.yaml
  2. If you don't want to wait for the whole podcast to be generated you can stop it mid-way, by pressing Ctrl+C in the terminal that you are running the process. It will stop and save the script and audio up until that point!
  3. Verify by checking podcast.txt and podcast.wav

Model IDs / Languages tested:

  • parler-tts/parler-tts-mini-multilingual-v1.1
    • Portuguese ( Sophia & Nicholas)
    • Dutch ( Mark & Jessica)
    • French ( Daniel & Christine)
    • German ( Nicole & Michelle)
    • Italian ( Julia & Richard)
    • Polish ( Alex & Natalie)
    • Spanish ( Steven & Olivia)
  • ai4bharat/indic-parler-tts
    • Hindi ( Rohit & Divya)
    • Telugu ( Prakash & Lalitha)
  • suno/bark
    • Spanish ( v2/es_speaker_0 & v2/es_speaker_8)
  • OuteTTS-0.2-500M
    • Korean (female_1 & male_1)

Additional notes for reviewers

Its expected that some languages will work better than others. Its also a common issue that the voice pattern might not be consistent across the speaker rounds (maybe Speaker 1 at first sound in one way, and then their voice might change)

I already...

  • Tested the changes in a working environment to ensure they work as expected
  • Added some tests for any new functionality
  • Updated the documentation (both comments in code and under /docs)

pyproject.toml Outdated Show resolved Hide resolved
example_data/config_bark.yaml Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to add multiple configs here, I guess is a discussion to have with the developer hub. All these seem like potential "Use Case" / "Customization" examples

@Kostis-S-Z Kostis-S-Z self-assigned this Dec 18, 2024
@Kostis-S-Z Kostis-S-Z linked an issue Dec 18, 2024 that may be closed by this pull request
@Kostis-S-Z Kostis-S-Z marked this pull request as ready for review December 19, 2024 13:40
@Kostis-S-Z Kostis-S-Z requested a review from a team December 19, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add multi-langage support
2 participants