-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of real-time, low latency signals without discontinuities #97
Comments
Interesting problem, and one I expect to face eventually as well. My instinct is that this would be best solved by a separate, specialized Handling overflow is an interesting problem. A few approaches come to mind:
Option 1 would give the best behavior when your sender is running at a slightly different sample rate, which is always true in principle, but I'm not sure the size of difference you'll tend to see in practice is large enough to justify the complexity. No other solution gracefully handles a sender that's producing samples slightly slower than expected. Resampling does impact quality some, but not much; oddio's 3D spatialization uses linear resampling pervasively to implement propagation delay and it hasn't caused any issues. I've considered offering pluggable resamplers of various quality vs. cost tradeoffs, but there hasn't been a practical need. Options 2 and 3 could look identical to the user if 3 was implemented as an opaque newtype around a We should not involve non-realtime threads in the loop for underruns as in your option 2, as there's no guarantee they'll get around to taking action before the underrun occurs. |
For buffer underruns there is also the option of repeating a chunk of frames (with a crossfade). It might be better than using a fade to 0 for some applications. Even better a dynamic approach could be used, repeat the last chunk for a small number of times and then do a fade to 0 in case the frame source is still stuck, to make it less annoying. For buffer overflows I think the modular approach (with a What do you think about discarding the queue of signals and recreating it every time there is a new batch (roughly every 10ms)? At the end I can contribute an example where everything is put together. |
I'm leery of schemes that rely on juggling high-level
The intention there is to address cases where e.g. audio is being streamed at 48001Hz, but played back at 47999Hz. This is practically guaranteed in real-time streaming, since no two audio devices' clocks will be perfectly synchronized. Resampling to mitigate this will not noticeably impact pitch. A different mechanism (fade-outs or repeating) is needed to handle lost data, but this should invisibly prevent all genuine real-time overruns. However, it's possible that desync in practice is so small that we can ignore the risk of overruns entirely. Research is needed to confirm, but my pessimistic assumption is that clocks will vary enough for this to matter.
Those manifest as underruns, which can be handled more simply, as discussed.
We should be able to do better. |
Yes, that was my intention. All the operations are deferred, it can work without acting in real-time. Only the underrun "coda" (cycle + fade out) is pre-queued, overrun crossfades can be applied without issue in the signal record thread at any time.
Maybe I should have been more specific, I was mostly talking about late packets. in ALVR (my project), late packets are as likely as dropped packets; late packets usually manifest as underrun + overrun shortly after. The overrun is handled by crossfading with a more recent segment of the buffer and the middle frames are discarded. So, I could create a new Signal that renders a cycle + fade out in case of underrun, but given its complexity and that the cycle idea was met with hesitation in the ALVR Discord server, I will make multiple tests before finalizing the PR. |
Right; I'm not convinced that constantly re-queueing underrun signals that we rarely actually reach is the way to go.
I see what you mean. In the specific case of real-time streaming, I think it's an implementation error for this to overrun. The logical playback cursor should advance continuously whether or not there's data, because we know in advance that the sender is doing the same, regardless of communication issues. If a block of audio arrives late, we therefore must not play it from the start, since that would correspond to moving the playback cursor backwards in time. Instead, it might be played from halfway through, or even skipped entirely if it's old enough. This ensures that a later timely block won't overrun the buffer because we didn't fill up the buffer with obsolete data to begin with, and only requires fade-out/fade-in across missing data. |
That's a good point. Currently my application does not have any timestamp for audio data, there are just packet indices that are hidden inside the socket wrapper implementation; we can only get a notification of a packet drop (because it was truly dropped by the network or if a packet with a later index arrived first). I think the underrun + overrun happens when multiple packets get queued up in the network. So to implement your suggestion, we should send timestamps with each packet, make a history-averaged prediction of the time origin of the stream, and discard packets with timestamp outliers. Probably choosing a time origin at the beginning of the stream might be problematic in case latency fluctuates during streaming. |
Something like that, yeah. A good timestamp format is a u64 sample number. Maybe that's what the API should work in terms of.
Yeah, ideally a change in latency should be handled gracefully. Perhaps by tracking the buffer size (normalized by sample rate) and crossfading or slightly adjusting playback rate if it passes a certain threshold above or below the target size, except at startup when playback should just pause until the desired size is reached. |
An easy way to adjust playback rate would be to skip or double every 1 in, say, 20 samples. Empirically this sounds okay, especially since we're only doing it in the very rare case where latency changes significantly. |
Thank you for all the feedback and suggestions. If you already did some tests, when you remove 1 sample every 20 does it sound generally higher pitch or does it create some high pitch noise, keeping roughly the same pitch for the rest of the frequency spectrum? |
It sounds slightly sped up, i.e. higher pitch, while doubling makes it lower. At 1 in 100 the difference is inaudible, but it's slightly noticeable still at 1 in 50. There's no clicking/popping. Assuming we want this to be impossible to notice, at 1% speedup/slowdown by this mechanism a 10ms change in latency is corrected for within 1 second. That seems pretty good to me, but I'm not strongly opposed to crossfades either, which have the advantage of resolving in constant time at the cost of outright skipping some (probably inconsequential) amount of data. I feel like skipping/doubling samples is probably a little cheaper and easier to implement, but it's a slow path anyway. |
I might add an option for skipping/doubling samples to match the target average buffer usage, but crossfades cannot be removed entirely, since network disruptions happen in bursts and should be resolved rather quickly (<1s) for a good user experience. Anyway, I'm putting this on hold since I'm working on something else at the moment (OpenXR) but I'll return to this right after that. |
I guess I'm thinking of fadeout/in as not quite the same thing as a crossfade, but I suppose the logic could be mostly shared. 👍 |
My project currently uses ad-hoc code to gracefully handle playback of an inconsistent stream of frames. It can handle buffer overflows, underflows and discontinuities (caused by packet drops)
https://github.com/alvr-org/ALVR/blob/master/alvr/audio/src/lib.rs
Now I would like to add support for mixing different audio sources. Oddio seems suited for this task, and at the same time I would use it to refactor and simplify the existing code. The problem is that the
Stream
signal API seems lacking for my use-case. It seems it cannot gracefully handle interruptions caused by buffer underflows, instead it would stop abruptly causing a “pop”, and there is no way of resuming the stream or detecting when the stream buffer has emptied.The most sensible solution for me (idea n.1) is to make
Stream
never return true foris_finished()
. Integrate support for a ramp down when the buffer has fewer than N frames, then resume with a rump up when the buffer has been filled enough (halfway?). Optionally support interrupting a ramp down and resume it with a ramp up if frames become available soon enough. Other types of discontinuities such as buffer overflow can be detected by the current API and can be handled with the help of aFader
.Another option (idea n.2) is to add support for polling when a
Stream
is going to run out of frames and let the user do a cross-fade with a 0 signal. But actually it would be better if this is handled with a callback. This might be more complex to implement and might not fit right with the current API.Which option is better? I would be available to make a PR.
On another note, I think the method of resampling used inside
Stream
might distort the signal, especially high frequencies.The text was updated successfully, but these errors were encountered: