Evaluation of Speech Embedding Methods

This repository contains the codes used in the paper: From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques

The small-scale models are:

Audio word2vec
Speech2vec
LEE
Siamese

In the root directory are the scripts for pre-training them, while in the task-specific directories (gender_id and emotion_id) are the codes for fine-tuning them. These models are implemented in Pytorch.

The task-specific directories also contain scripts for fine-tuning the self-supervised models used in this study. We only provide the codes for the base Wav2vec2 model and Whisper because the code is almost the same for the rest of the models. To use another pre-trained, self-supervised model, modify this line wav2vec2_hub: facebook/wav2vec2-base in the hyperparams.yaml file with the link to the model. These models are implemented using the SpeechBrain toolkit.

The repository also contained scripts for running the Integrated Gradients algorithm to select the 10% of most important dimensions.

The following self-supervised models are used in the study:

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
audio_word2vec		audio_word2vec
discriminative_acoustic_embeddings		discriminative_acoustic_embeddings
emotion_id		emotion_id
gender_id		gender_id
linguistically_enhanced_embeddings		linguistically_enhanced_embeddings
speech2vec		speech2vec
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation of Speech Embedding Methods

About

Releases

Packages

Languages

License

aalto-speech/evaluation_of_speech_embedding_methods

Folders and files

Latest commit

History

Repository files navigation

Evaluation of Speech Embedding Methods

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages