Replies: 1 comment 3 replies
-
This is related to #1205 #1218 #1085 and many other issues/discussions. Same answer from my side: feel free to contribute a PR adding this feature as this seems to be a recurrent request... |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone!
I am using the speaker diarization module to match podcast episode transcripts with the speakers, and it works very well, but it would be really nice to be able to only match the speaker names once with the recognized speakers.
So I was wondering, if instead of calling the pretrained speaker-diarization pipeline each time on the separate audio files and thus fitting the clusters each time, is it possible to just fit the speaker embedding cluster centroids once, and reuse them for the new audio files by just predicting them on the original clusters?
The diarization time also doesn't seem to scale linearly (it takes 3:50min to diarize a 1h audio sample on an RTX3060, but 20min to do a 3h one), and for the full 5h episodes it even fails sometimes, so I was thinking it may be because of the clustering.
Beta Was this translation helpful? Give feedback.
All reactions