streaming nnvad #997

YLQY · 2022-05-31T09:08:43Z

YLQY
May 31, 2022

Hi . I am following pyannote's vad model
The vad in pyannote is not good for streaming. I use 2s, or 1s or 0.5s chunks for the audio, and the result of doing vad is not very good. After reading SincNet, the first layer is self.wav_norm1d = nn.InstanceNorm1d(1, affine=True). The data is to use torchaudio to read audio, torchaudio will automatically normalize the entire audio, I found that the maximum and minimum results of normalization are not (-1, 1), nor (0, 1). Excuse me, if I want to train this model into a stream recognition model, can I not standardize the entire audio when reading the sample point data, but standardize the local chunk internally. Or want to change this model to flow recognition, do you have any good suggestions here?
thanks

hbredin · 2022-05-31T09:30:37Z

hbredin
May 31, 2022
Maintainer

Here are a few links that might be relevant.

0 replies

YLQY · 2022-05-31T09:58:37Z

YLQY
May 31, 2022
Author

Thanks

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streaming nnvad #997

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

streaming nnvad #997

YLQY May 31, 2022

Replies: 2 comments

hbredin May 31, 2022 Maintainer

YLQY May 31, 2022 Author

YLQY
May 31, 2022

hbredin
May 31, 2022
Maintainer

YLQY
May 31, 2022
Author