This repository contains the codes used in the paper: From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques
The small-scale models are:
- Audio word2vec
- Speech2vec
- LEE
- Siamese
In the root directory are the scripts for pre-training them, while in the task-specific directories (gender_id and emotion_id) are the codes for fine-tuning them. These models are implemented in Pytorch.
The task-specific directories also contain scripts for fine-tuning the self-supervised models used in this study. We only provide the codes for the base Wav2vec2 model and Whisper because the code is almost the same for the rest of the models.
To use another pre-trained, self-supervised model, modify this line wav2vec2_hub: facebook/wav2vec2-base
in the hyperparams.yaml
file with the link to the model. These models are implemented using the SpeechBrain toolkit.
The repository also contained scripts for running the Integrated Gradients algorithm to select the 10% of most important dimensions.
The following self-supervised models are used in the study:
-
English:
-
Finnish:
-
French: