This is a KAIST CS470 team 27's project: Introduction to A.I. The original repo is from here: https://github.com/liusongxiang/StarGAN-Voice-Conversion.
This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks https://arxiv.org/abs/1806.02169 . Note that the model architecture is a little different from that of the original paper.
- Python 3.6 (or 3.5)
- Pytorch 1.7.0
- pyworld
- tqdm
- librosa
- tensorboardX and tensorboard
- soundfile
Before Traning You can skip the training and start from the Convert part. If you want to start from the beginning, please delete the "data/VCTK-Corpus" folder.
Download and unzip VCTK corpus to designated directories.
mkdir ./data
wget https://datashare.is.ed.ac.uk/bitstream/handle/10283/2651/VCTK-Corpus.zip?sequence=2&isAllowed=y
unzip VCTK-Corpus.zip -d ./data
If the downloaded VCTK is in tar.gz, run this:
tar -xzvf VCTK-Corpus.tar.gz -C ./data
We will use Mel-cepstral coefficients (MCEPs) here.
python preprocess.py --sample_rate 16000 \
--origin_wavpath data/VCTK-Corpus/wav48 \
--target_wavpath data/VCTK-Corpus/wav16 \
--mc_dir_train data/mc/train \
--mc_dir_test data/mc/test
Or, you can run the preproess.ipynb in the Colab.
Note: you may need to early stop the training process if the training-time test samples sound good or you can also see the training loss curves to determine early stop or not.
python main.py
Or, you can run the main.ipynb in the Colab.
- We use python library spleeter to split song files into vocal and accompaniment files. For example, split song file named song.mp3 and store result in the output folder
spleeter separate -i song.mp3 -o output
- We use pydub to merge converted file and MR
Or, you can run the split.ipynb in the Colab.
Important!!
- You should save the content(source) song in the path./data/VCTK-Corpus/cs/p229/p229_009.wav. Here you should change the content song's name to p229_009.wav.
- You should save the content(source) song in the ./data/VCTK-Corpus/cs/p232. Here, you don't need to change the target singer’s song's name.
- Make sure that in the p232, there should be only the songs of the singer you want. Clear the directory whenever you want to convert to the different singers.
For example: restore model at step 100000.
convert.py --resume_iters 100000
Or, you can run the convert.ipynb in the Colab.
We implement the SNR, which evaluates the model's performance with noise. you can run the SNR.ipynb in the Colab.
If you want to evaluate your converted song, save your song into the./SNR folder.