NumberOfSpeakerEstimation

The goal of this project is to estimate the number of speakers that appear in an audio fragment. The first goal is to replicate the following study: CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning. On top of that, two experiments are executed with this model:

Investigating the effect of the number of unique speakers in the dataset on this model
Investigating the effect of Multilingual data on the performance of the model and its ability to generalize to unseen data

Structure of this repo

This repo contains the following folders:

In model, the code for the general model can be found.
In pretrained_models, some pretrained models of this project can be found. More specifically:
- model-best-baseline.h5 is the baseline model, trained completely on LibriSpeech-360 Clean.
- model-best-{250, 750, 750}.h5 are the models used for investigating the effect of the number of unique speakers in the dataset.
- multilingual-model-best.h5 is the model trained on the multilingual dataset and used to investigate whether it performs better than the baseline model.
In src, all python files containing code can be found.

Besides these folders, there are the following notebook:

Creating Dataset.ipynb demonstrates the full pipeline of creating the dataset used for training the baseline model.
Experimental Datasets Unique Speakers.ipynb demonstrates all code used for testing the effect of the number of unique speakers in the dataset on the performance of the model.
Experimens with Multilingual Datasets.ipynb demonstrates all code used for testing the effect of training the model on a multilingual dataset on the performance of the model.
SaliencyMaps.ipynb contains all code used to create the saliency maps for the models.

How do I use this repo?

Evidently, running the notebooks is fairly straightforward. If you want to train a model, make sure the correct path to the data is set in model_trainer.py. After this, simply run the command ./run_model.sh train This will initiate a run on Weights & Biases. After training, make sure to download the trained model from this specific run.

If you want to test the model that has been trained, make sure that the correct path to the test set is set in model_test.py. Besides this, also make sure that the correct path to the pretrained model is set in model_test.py. After making sure these things are set correctly, simply run the command: ./run_model.sh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NumberOfSpeakerEstimation

Structure of this repo

How do I use this repo?

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
model		model
pretrained_models		pretrained_models
src		src
.gitignore		.gitignore
Creating Dataset.ipynb		Creating Dataset.ipynb
Experimental Datasets Unique Speakers.ipynb		Experimental Datasets Unique Speakers.ipynb
Experiments with Multilingual Datasets.ipynb		Experiments with Multilingual Datasets.ipynb
README.md		README.md
SaliencyMaps.ipynb		SaliencyMaps.ipynb
create_data.py		create_data.py
model_test.py		model_test.py
model_trainer.py		model_trainer.py
requirements.txt		requirements.txt
run_model.sh		run_model.sh

Avuerro/NumberOfSpeakerEstimation

Folders and files

Latest commit

History

Repository files navigation

NumberOfSpeakerEstimation

Structure of this repo

How do I use this repo?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages