Skip to content

Predict the speaker's class from the given speech. Trained and tested by VoxCeleb1.

License

Notifications You must be signed in to change notification settings

bonjour-npy/Speaker-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Speaker Classification

image-20240113175506430

Overview

Classify the speaker of given features, learn how to use Transformer and how to adjust parameters of transformer.

Dataset

The original dataset is VoxCeleb1.

We randomly select 600 speakers from VoxCeleb1, then preprocess the raw waveforms into mel-spectrograms. You can download the preprocessed dataset from Google Drive.

Screenshot 2024-01-13 163041

Arguments:

  • data_dir: The path to the data directory.

  • metadata_path: The path to the metadata.

  • segment_len: The length of audio segment for training.

The architecture of dataset directory is shown below, where uttr-{random string}.pt represents PyTorch data file containing valid mel-spectrogram data.

data directory/
├── mapping.json
├── metadata.json
├── testdata.json
└── uttr-{random string}.pt

Related

This is also the assignment solution of ML2021Spring HW4.

About

Predict the speaker's class from the given speech. Trained and tested by VoxCeleb1.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published