Emotional-Text-to-Speech / pytorch-dc-tts Public

forked from tugstugi/pytorch-dc-tts

Notifications You must be signed in to change notification settings
Fork 6
Star 12

Text to Speech with PyTorch (English and Mongolian)

12 stars 80 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
datasets		datasets
models		models
notebooks		notebooks
samples		samples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio.py		audio.py
dl_and_preprop_dataset.py		dl_and_preprop_dataset.py
hyperparams.py		hyperparams.py
logger.py		logger.py
process_emovdb.py		process_emovdb.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py
train-ssrn.py		train-ssrn.py
train-text2mel.py		train-text2mel.py
utils.py		utils.py

Repository files navigation

Pytorch Implementation of DC-TTS for Emotional TTS

This fork is modified to work for transfer learning for low-resource emotional TTS, as described here.

Training

Install the dependencies using pip install -r requirements.txt
Preprocess the EmoV-DB dataset using process_emovdb.py
Change the logdir argument in hyperparams.py. Other parameters can be edits optionally. DO NOT edit these hyperparameters.
Add the path to the pre-trained Text2Mel model in the logdir
Comment this line if you are not running the train-text2mel.py file for the first time.
Run the training script like - python train-text2mel.py --dataset=emovdb

Synthesis

Write the sentences that you want to generate here
Add the checkpoint for the fine-tuned Text2Mel model in place of this line
Edit the paths for the output.
Run the synthesis script like - python synthesize.py -- dataset=emovdb

Readme of the original repository

PyTorch implementation of Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention based partially on the following projects:

https://github.com/Kyubyong/dc_tts (audio pre processing)
https://github.com/r9y9/deepvoice3_pytorch (data loader sampler)

Online Text-To-Speech Demo

The following notebooks are executable on https://colab.research.google.com :

For audio samples and pretrained models, visit the above notebook links.

Training/Synthesizing English Text-To-Speech

The English TTS uses the LJ-Speech dataset.

Download the dataset: python dl_and_preprop_dataset.py --dataset=ljspeech
Train the Text2Mel model: python train-text2mel.py --dataset=ljspeech
Train the SSRN model: python train-ssrn.py --dataset=ljspeech
Synthesize sentences: python synthesize.py --dataset=ljspeech
- The WAV files are saved in the samples folder.

Training/Synthesizing Mongolian Text-To-Speech

The Mongolian text-to-speech uses 5 hours audio from the Mongolian Bible.

Download the dataset: python dl_and_preprop_dataset.py --dataset=mbspeech
Train the Text2Mel model: python train-text2mel.py --dataset=mbspeech
Train the SSRN model: python train-ssrn.py --dataset=mbspeech
Synthesize sentences: python synthesize.py --dataset=mbspeech
- The WAV files are saved in the samples folder.

About

Text to Speech with PyTorch (English and Mongolian)

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 99.2%
Python 0.8%