Multilingual support #72

Kostis-S-Z · 2024-12-18T13:05:32Z

What's changing

The overall goal of this PR is to add support for more languages.

In the process, the following changes / additions were made:

Add unified TTS model interface to hide away complexity of the different tts models: TTSModel
Create registry for tts model loading functions: TTS_LOADERS
Create registry for tts model inference functions: TTS_INFERENCE
Validate config.yaml's text_to_speech_model with validate_text_to_speech_model
Re-write test_load_tts_model to use parametrization

Closes #29

How to test it

Steps to test the changes:

Clone the repository: git clone https://github.com/Kostis-S-Z/document-to-podcast.git
Move to the directory: cd document-to-podcast
Change branch: git checkout multilingual-support
Install the package: pip install -e .
Pick a model and languages from the list below (under Model IDs / Languages tested:)
Edit example_data/config.yaml (or create a copy) and change

input_file: Use a file that has text in a language of your choice
text_to_speech_model: Use one of the model ids, defined below, based on the language you are testing
text_to_text_prompt: Re-write it / Translate it in the testing language
speaker/description: Re-write it / Translate it in the testing language
voice_profile: Use one of the pre-defined profiles based on the testing language, from the list below

document-to-podcast --from_config example_data/config.yaml
If you don't want to wait for the whole podcast to be generated you can stop it mid-way, by pressing Ctrl+C in the terminal that you are running the process. It will stop and save the script and audio up until that point!
Verify by checking podcast.txt and podcast.wav

Model IDs / Languages tested:

Additional notes for reviewers

Its expected that some languages will work better than others. Its also a common issue that the voice pattern might not be consistent across the speaker rounds (maybe Speaker 1 at first sound in one way, and then their voice might change)

I already...

Tested the changes in a working environment to ensure they work as expected
Added some tests for any new functionality
Updated the documentation (both comments in code and under /docs)

pyproject.toml

src/document_to_podcast/inference/text_to_speech.py

example_data/config_bark.yaml

daavoo · 2024-12-18T14:01:36Z

example_data/config_bark.yaml

Not sure if we want to add multiple configs here, I guess is a discussion to have with the developer hub. All these seem like potential "Use Case" / "Customization" examples

Co-authored-by: David de la Iglesia Castro <[email protected]>

Kostis-S-Z added 13 commits December 17, 2024 16:09

[WIP] Add bark and parler multi support

286a93c

Add config files for other models to easily test across models

14b69bf

Use model loading wrapper function for download_models.py

20ab8e9

Make sure transformers>4.31.0 (required for bark model)

ee38e10

Add parler dependency

890c684

Use TTSModelWrapper for demo code

8cc7b0d

Use TTSModelWrapper for cli

dcbb254

Add outetts_language attribute

b0d40bc

Add TTSModelWrapper

5e47b1e

Update text_to_speech.py

945c44f

Pass model-specific variables as **kwargs

4565fb8

Rename TTSModelWrapper to TTSInterface

01d0e7a

Update language argument to kwargs

5af3e72

daavoo reviewed Dec 18, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

src/document_to_podcast/inference/text_to_speech.py Show resolved Hide resolved

example_data/config_bark.yaml Show resolved Hide resolved

daavoo reviewed Dec 18, 2024

View reviewed changes

Kostis-S-Z and others added 2 commits December 18, 2024 14:33

Remove parler from dependencies

e3a3f17

Co-authored-by: David de la Iglesia Castro <[email protected]>

Merge branch 'mozilla-ai:main' into multilingual-support

a918574

Kostis-S-Z self-assigned this Dec 18, 2024

Kostis-S-Z linked an issue Dec 18, 2024 that may be closed by this pull request

Add multi-langage support #29

Open

Kostis-S-Z mentioned this pull request Dec 18, 2024

Enable user to exit podcast generation gracefully #74

Open

3 tasks

Kostis-S-Z added 9 commits December 19, 2024 11:55

Separate inference from TTSModel

fb814fa

Make sure config model is properly registered

672c0e0

Decouple loading & inference of TTS model

28b02b8

Decouple loading & inference of TTS model

b489e0d

Enable user to exit podcast generation gracefully

dc89668

Add Q2 Oute version to TTS_LOADERS

0d143eb

Add comment for support in TTS_INFERENCE

e9ca498

Update test_model_loaders.py

47112a0

Update test_text_to_speech.py

ec0fe5a

Kostis-S-Z marked this pull request as ready for review December 19, 2024 13:40

Kostis-S-Z requested a review from a team December 19, 2024 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilingual support #72

Multilingual support #72

Kostis-S-Z commented Dec 18, 2024 •

edited

Loading

daavoo Dec 18, 2024

Multilingual support #72

Are you sure you want to change the base?

Multilingual support #72

Conversation

Kostis-S-Z commented Dec 18, 2024 • edited Loading

What's changing

How to test it

Additional notes for reviewers

I already...

daavoo Dec 18, 2024

Choose a reason for hiding this comment

Kostis-S-Z commented Dec 18, 2024 •

edited

Loading