Skip to content

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

License

Notifications You must be signed in to change notification settings

Dapwner/GST-Tacotron

 
 

Repository files navigation

GST-Tacotron-Pytorch

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

model

Update

Add support for blizzard dataset.

Requirements

pip3 install -r requirements.txt

File structure

  • Hyperparameters.py --- hyperparameters
  • Network.py --- encoder and decoder
  • Modules.py --- some modules for tacotron
  • Loss.py --- loss function
  • Data.py --- dataset loader
  • utils.py --- some util functions for data I/O
  • Synthesis.py --- speech generation

How to train

  • Download a multispeaker dataset
  • Preprocess your data and implement your get_XX_data function in Data.py
  • Set hyperparameters in Hyperparameters.py
  • Make a directory named log as follow:
--- log
|    |
|    --- log[log_number]
|
--- code
     |
     --- Tacotron
             |
             --- train.py
             |
             --- Network.py
             |
           ......
  • Run train.py
python3 train.py [log_number] [dataset_size] [start_epoch]

[log_number]: the log directory number
[dataset_size]: int or all
[start_epoch]: which epoch start to train (0 if start from scratch )

for example:
python3 train.py 0 all 0

How to generate wav

Rungenerate.py. Replace the text in generate.py with any chinese sentences as you like before running

The pretained model provided is trained on Chinese dataset, so it only supports chinese now.

About

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%