-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realtime? (low-latency streaming inference) #43
Comments
Hi, thanks for asking! Unfortunately, the current model is not able to do the real-time transcription. The real-time model would need some special architecture, which is not implemented in the current model. |
Fair enough, I'm curious about the theoretical minimum latency of the model. I see there is a |
hi sorry for the late reply. For this model, the minimum latency would be 0.025 + 0.01 + 0.01 because the window are overlapping by 0.01. And of course you also need to consider the time spent on feature extraction and inference |
hi, continuing on wills thread about real time audio streaming, we ran into a bit of a blocker. In the lifter (pm.feature.lifter) function it seems to change the output based on the length of the array inputted as "cepstra". The same array just with less elements gets returned with different values. Is there an obvious way to make this function invariant to input array length? Or do we need to keep state and have a rolling average type thing? Thanks |
Hello, just wanna clarify, is it because the current model uses a bidirectional LSTM this is not possible? |
Thanks for allosaurus, my experiments with it have been fruitful so far. Very impressive work!
I'm curious about whether the architecture of this package is suitable for operating on streaming audio at a reasonably low-latency?
I haven't dug much further than what I needed to load a file with pydub and get some output, and am happy to dig further. I thought it could be a good idea to start a conversation about this, perhaps the system and models are totally unsuitable for real-time, or perhaps it might just require a bit of engineering effort from me.
Thanks in advance
The text was updated successfully, but these errors were encountered: