Replies: 2 comments
-
Here are a few links that might be relevant. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi . I am following pyannote's vad model
The vad in pyannote is not good for streaming. I use 2s, or 1s or 0.5s chunks for the audio, and the result of doing vad is not very good. After reading SincNet, the first layer is self.wav_norm1d = nn.InstanceNorm1d(1, affine=True). The data is to use torchaudio to read audio, torchaudio will automatically normalize the entire audio, I found that the maximum and minimum results of normalization are not (-1, 1), nor (0, 1). Excuse me, if I want to train this model into a stream recognition model, can I not standardize the entire audio when reading the sample point data, but standardize the local chunk internally. Or want to change this model to flow recognition, do you have any good suggestions here?
thanks
Beta Was this translation helpful? Give feedback.
All reactions