-
I am a huge fan of your work with pyannote and I greatly admire the idea as well as the work placed behind creating & maintaining this repository. The issue is concerned with speed. Even though the speaker diarization pipeline obtained from
when applied over audio files, provides extremely great fine-grained segmentation; it is really slow especially when it is running on CPU, instead of a GPU, and consumes a lot of time for diarization tasks. Even after downloading the model into the local system, there is no noticeable difference in speed. In some cases, real-time factor is between 0,5 and 1 with CPU. So it needs 30 to 60 minutes for a file of 1-hour duration. Are there any pre-trained pipelines that work nearly as well but are significantly faster? I would like to know why it is so slow even though the downloaded models seem rather small. And is there some way, we can make it work faster? Currently, I believe that the audio file is provided as raw wave form to the end-to-end pipeline. Is it possible to provide, mfcc or other features extracted from the audio file, directly to the pipeline to obtain results similar to those published in your paper and perhaps, making it faster? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
No, there isn’t any.
You could increase the step to reduce the overlap between consecutive chunks in SAD and SCD step.
No. You would have to retrain a model for that. |
Beta Was this translation helpful? Give feedback.
-
Hello, thank you for your admirable work. |
Beta Was this translation helpful? Give feedback.
No, there isn’t any.
You could increase the step to reduce the overlap between consecutive chunks in SAD and SCD step.