基于StarGAN的语音转换模型

这是一个Pythorch实现的论文: StarGAN VC：星型生成对抗网络下的非并行多对多语音转换.

转换后的语音示例位于samples和results_2019-06-10目录中

依赖

Python 3.6+
pytorch 1.0
librosa
pyworld
tensorboardX
scikit-learn

使用方式

下载数据集

将vcc 2016数据集下载到当前目录。

python download.py

下载的zip文件解压到./data/vcc2016_training和./data/evaluation_all两个目录。

训练集： 在本文中，作者从目录./data/vcc2016_training选用四个说话人。所以我们将对应的文件夹（比如SF1,SF2,TM1,TM2）到./data/speakers.
测试集： 在本文中，作者从目录./data/evaluation_all选用四个说话人。所以我们将对应的文件夹（比如SF1,SF2,TM1,TM2）到./data/speakers_test.

那么数据目录会变成这样：

data
├── speakers  (训练集)
│   ├── SF1
│   ├── SF2
│   ├── TM1
│   └── TM2
├── speakers_test (测试集)
│   ├── SF1
│   ├── SF2
│   ├── TM1
│   └── TM2
├── vcc2016_training (vcc 2016训练集)
│   ├── ...
├── evaluation_all (vcc 2016评价集，作为测试集合)
│   ├── ...

预处理

从每个语音片段中提取特征（mcep、f0、ap）。这些特性存储为npy文件。我们还计算了每个说话人的统计特征。

python preprocess.py

这个预处理很可能花几分钟！

训练

python main.py

转换

python main.py --mode test --test_iters 200000 --src_speaker TM1 --trg_speaker "['TM1','SF1']"

网络结构

注：我们的实现遵循了原论文的网络结构，而pytorch-StarGAN的VC代码使用StarGAN的网络。两者都有可以产生良好的音质。

参考

tensorflow StarGAN-VC代码

更新于2019/06/10

原实现的网络结构是原论文的网络结构，但为了达到更好的转换效果，本次更新做了如下修改：

无训练问题的分类器改进
更新损失函数
将鉴别器激活函数修改为tanh（双曲正切函数）

如果你觉得这个回购是好的，请点星！

你的鼓励是我最大的动力！

StarGAN-VC

This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.

The converted voice examples are in samples and results_2019-06-10 directory

Dependencies

Python 3.6+
pytorch 1.0
librosa
pyworld
tensorboardX
scikit-learn

Usage

Download dataset

Download the vcc 2016 dataset to the current directory

python download.py

The downloaded zip files are extracted to ./data/vcc2016_training and ./data/evaluation_all.

training set: In the paper, the author choose four speakers from ./data/vcc2016_training. So we move the corresponding folder(eg. SF1,SF2,TM1,TM2 ) to ./data/speakers.
testing set In the paper, the author choose four speakers from ./data/evaluation_all. So we move the corresponding folder(eg. SF1,SF2,TM1,TM2 ) to ./data/speakers_test.

The data directory now looks like this:

data
├── speakers  (training set)
│   ├── SF1
│   ├── SF2
│   ├── TM1
│   └── TM2
├── speakers_test (testing set)
│   ├── SF1
│   ├── SF2
│   ├── TM1
│   └── TM2
├── vcc2016_training (vcc 2016 training set)
│   ├── ...
├── evaluation_all (vcc 2016 evaluation set, we use it as testing set)
│   ├── ...

Preprocess

Extract features (mcep, f0, ap) from each speech clip. The features are stored as npy files. We also calculate the statistical characteristics for each speaker.

python preprocess.py

This process may take minutes !

Train

python main.py

Convert

python main.py --mode test --test_iters 200000 --src_speaker TM1 --trg_speaker "['TM1','SF1']"

Network structure

Note: Our implementation follows the original paper’s network structure, while pytorch StarGAN-VC code use StarGAN's network.Both can generate good audio quality.

Reference

tensorflow StarGAN-VC code

StarGAN code

CycleGAN-VC code

pytorch-StarGAN-VC code

StarGAN-VC paper

StarGAN paper

CycleGAN paper

Update 2019/06/10

The former implementation's network structure is the network of the original paper, but in order to achieve better conversion result, the following modifications are made in this update:

Modification of classifier without training problem
Update loss function
Modify the discriminator activation function to tanh

If you feel this repo is good, please star !

Your encouragement is my biggest motivation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

基于StarGAN的语音转换模型

依赖

使用方式

下载数据集

预处理

训练

转换

网络结构

参考

更新于2019/06/10

StarGAN-VC

Dependencies

Usage

Download dataset

Preprocess

Train

Convert

Network structure

Reference

Update 2019/06/10

Files

README.md

Latest commit

History

README.md

File metadata and controls

下载数据集

预处理

训练

转换

更新于2019/06/10

Download dataset

Preprocess

Train

Convert

Update 2019/06/10