Increasing speed of Speaker Diarization pipeline with CPU #778

Vanargh · 2021-10-07T12:01:01Z

Vanargh
Oct 7, 2021

I am a huge fan of your work with pyannote and I greatly admire the idea as well as the work placed behind creating & maintaining this repository.

The issue is concerned with speed. Even though the speaker diarization pipeline obtained from

pipeline = torch.hub.load('pyannote/pyannote-audio', 'dia')

when applied over audio files, provides extremely great fine-grained segmentation; it is really slow especially when it is running on CPU, instead of a GPU, and consumes a lot of time for diarization tasks. Even after downloading the model into the local system, there is no noticeable difference in speed.

In some cases, real-time factor is between 0,5 and 1 with CPU. So it needs 30 to 60 minutes for a file of 1-hour duration. Are there any pre-trained pipelines that work nearly as well but are significantly faster? I would like to know why it is so slow even though the downloaded models seem rather small. And is there some way, we can make it work faster? Currently, I believe that the audio file is provided as raw wave form to the end-to-end pipeline. Is it possible to provide, mfcc or other features extracted from the audio file, directly to the pipeline to obtain results similar to those published in your paper and perhaps, making it faster?

Answered by hbredin

Oct 9, 2021

In some cases, real-time factor is between 0,5 and 1 with CPU. So it needs 30 to 60 minutes for a file of 1-hour duration. Are there any pre-trained pipelines that work nearly as well but are significantly faster?

No, there isn’t any.

I would like to know why it is so slow even though the downloaded models seem rather small. And is there some way, we can make it work faster?

You could increase the step to reduce the overlap between consecutive chunks in SAD and SCD step.

Currently, I believe that the audio file is provided as raw wave form to the end-to-end pipeline. Is it possible to provide, mfcc or other features extracted from the audio file, directly to the pipeline to obtain re…

View full answer

hbredin · 2021-10-09T13:21:10Z

hbredin
Oct 9, 2021
Maintainer

In some cases, real-time factor is between 0,5 and 1 with CPU. So it needs 30 to 60 minutes for a file of 1-hour duration. Are there any pre-trained pipelines that work nearly as well but are significantly faster?

No, there isn’t any.

I would like to know why it is so slow even though the downloaded models seem rather small. And is there some way, we can make it work faster?

You could increase the step to reduce the overlap between consecutive chunks in SAD and SCD step.

Currently, I believe that the audio file is provided as raw wave form to the end-to-end pipeline. Is it possible to provide, mfcc or other features extracted from the audio file, directly to the pipeline to obtain results similar to those published in your paper and perhaps, making it faster?

No. You would have to retrain a model for that.

2 replies

Vanargh Oct 11, 2021
Author

Thank you for the answers @hbredin. Could you provide more inputs on how to increase the step size to reduce the overlap? Can this be easily incorporated within the existing pipeline and is there some readme/tutorials?

asadullah797 Apr 19, 2023

@Vanargh Did you get an answer.
I am also looking for speedup factors as my speaker Diarization inference is very slow even on 1 min audio.

Massooma · 2023-05-02T13:57:03Z

Massooma
May 2, 2023

Hello, thank you for your admirable work.
I would like to know what is the average time for a diarization processing on an 1 hour audio please?

1 reply

prkumar112451 May 28, 2024

a 10 min recording takes 30 seconds to diarize.
and generally I have found the mins of recording and processing time to be linearly dependent on each other.

so a 1 hr recording will take roughly 1.5 minutes to diarize.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing speed of Speaker Diarization pipeline with CPU #778

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Increasing speed of Speaker Diarization pipeline with CPU #778

Vanargh Oct 7, 2021

Replies: 2 comments · 3 replies

hbredin Oct 9, 2021 Maintainer

Vanargh Oct 11, 2021 Author

asadullah797 Apr 19, 2023

Massooma May 2, 2023

prkumar112451 May 28, 2024

Vanargh
Oct 7, 2021

Replies: 2 comments 3 replies

hbredin
Oct 9, 2021
Maintainer

Vanargh Oct 11, 2021
Author

Massooma
May 2, 2023