Classify the speaker of given features, learn how to use Transformer and how to adjust parameters of transformer.
The original dataset is VoxCeleb1.
We randomly select 600 speakers from VoxCeleb1, then preprocess the raw waveforms into mel-spectrograms. You can download the preprocessed dataset from Google Drive.
Arguments:
-
data_dir: The path to the data directory.
-
metadata_path: The path to the metadata.
-
segment_len: The length of audio segment for training.
The architecture of dataset directory is shown below, where uttr-{random string}.pt
represents PyTorch data file containing valid mel-spectrogram data
.
data directory/
├── mapping.json
├── metadata.json
├── testdata.json
└── uttr-{random string}.pt
This is also the assignment solution of ML2021Spring HW4.