-
I have an use case where I want to do speech recognition and then diarization, and I've been exploring using pyannote for the latter and seeing good accuracy. I was wondering whether it's possible to pass in the word timings already obtained from speech recognition into pyannote (ie skip the VAD step / improve speed). |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Not out of the box. Also, the speaker diarization pipeline does not rely on a VAD step per se. I am also curious about your use case and the "good accuracy" you got. |
Beta Was this translation helpful? Give feedback.
Not out of the box. Also, the speaker diarization pipeline does not rely on a VAD step per se.
So you would be better off rewriting a pipeline from scratch.
I am also curious about your use case and the "good accuracy" you got.
Feel free to drop me an email to tell me a bit more :)