VAD inference time #949

kuruvachankgeorge · 2022-04-11T01:08:49Z

kuruvachankgeorge
Apr 11, 2022

Hi folks! I'm wondering if anyone has compared the inference time (latency) of the Pyannote VAD with other popular VAD algorithms, like Silero-VAD etc. I could find a comparison on their performance in terms of False alarm and Missed detection rates in the paper, but nothing is clearly mentioned about the latency or computational load. Would be great if you can share these details as well. May I also know if pyannote segmentation model supports onnx or tensorRT (for reducing the inference time) because my intention is to integrate the VAD to my ASR engine (by replacing the webrtc VAD for better speech-silence segmentation) for real-time inferencing. Thanks!

hbredin · 2022-04-11T08:01:14Z

hbredin
Apr 11, 2022
Maintainer

Thanks for your interest in pyannote.

Improving inference time is definitely not the priority right now as pyannote.audio was not initially designed for real-time processing.

That being said,

this blog post may help you setting up streaming VAD with pyannote.
this paper provides some numbers (though it does more than just VAD)

Feel free to contribute a proper (independent) benchmark.

1 reply

kuruvachankgeorge Apr 12, 2022
Author

Thanks @hbredin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAD inference time #949

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

VAD inference time #949

kuruvachankgeorge Apr 11, 2022

Replies: 1 comment · 1 reply

hbredin Apr 11, 2022 Maintainer

kuruvachankgeorge Apr 12, 2022 Author

kuruvachankgeorge
Apr 11, 2022

Replies: 1 comment 1 reply

hbredin
Apr 11, 2022
Maintainer

kuruvachankgeorge Apr 12, 2022
Author