-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform client side interpolation on macOS for cubeb_stream_get_position #99
Perform client side interpolation on macOS for cubeb_stream_get_position #99
Conversation
bd87193
to
4166e39
Compare
cargo-count.dot
Outdated
@@ -0,0 +1,102 @@ | |||
digraph dependencies { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it?
I guess it's something for other purposes. Could it be added in another PR if it's not necessary part to fix this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mistake, please disregard.
src/backend/mod.rs
Outdated
let now = unsafe { mach_absolute_time() }; | ||
let diff = now - self.output_callback_tstamp.load(Ordering::SeqCst); | ||
let frames = (host_time_to_ns(diff) | ||
* self.core_stream_data.output_stream_params.rate() as u64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns here:
- Why
output_stream_params.rate()
is chosen overoutput_hw_rate
?output_hw_rate
is retrieved fromoutput_unit
andaudiounit_output_callback
is tied tooutput_unit
. I wonder if usingoutput_hw_rate
can get better accuracy. - The
frames
here must be less thanframes_queued - frames_played
, which is the frames the resampler needs from the previous callback to the next callback. It's better to add an assertion to guarantee we won't jump to the future. output_stream_params.rate()
is the frames that will be consumed within 1 second, in the ideal case. However, the frames that will be consumed in reality are calculated by the resampler. Is it better to use theoutframes
and time-diff to calculate the frame-diff here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- External cubeb APIs only even surface the user rate, never the hardware rate, because the hardware rate is unknown to the user and can change at any time. Here, we have a number of nanoseconds we want to convert in frames, at the user rate.
- In practice this can never go much higher than the output callback buffer size, by construction. It can go higher than the audio output buffer callback if the audio output callback is delayed but that would be a bug.
- It's what this is doing?
frame_played
comes fromframes_queued
, which is an integration over the number of frames the resampler produces, so it's in the user rate. We use time rather than frame because that's how client-side interpolation work. I don't think I understand what you're asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try explaining 3 in another way.
Assume the output rate is 48000
. (48000 frames / second = 0.000048 / nanosecond).
Suppose audiounit_output_callback
comes at time 9580492198733
, and
- the resampler asks for
557
frames. - The scheduled output time is
9580506217751
(from(*tstamp).mHostTime
inaudiounit_output_callback
)
From client-side's perspective, we may expect (scheduled-output-time - current-time) * sample-rate =(9580506217751 - 9580492198733) * 0.000048 = 673
frames would be consumed since the output-rate is 48000
. But from the resampler's perspective, only 557
would be consumed instead.
Now, if we try to get the position at time 9580503767252
.
The time-diff calculated here is (current time - output callback timestamp) = 9580503767252 - 9580492198733 = 11568519
(ns), and the frame-diff calculated in this patch is 11568519 * 0.000048 = 555
frames. (It's almost all the frames asked by the resampler in the previous callback) This calculation is based on client-side's perspective.
From resampler's perspective, if the frames are consumed linearly during a period,
we may only consume (current time - output callback timestamp) / (scheduled output time - output callback timestamp) * frames-comsumed-by-resampler-in-a-period = (9580503767252 - 9580492198733) / (9580506217751 - 9580492198733) * 557 = (11568519 / 14019018) * 557 = 460
frames.
So here is what I confused: The position calculation is estimated from two different perspectives at the same time, after applying this change.
The position is now calculated by played_frame + estimated frames cunsumed.
The played_frame we use is based on how much frames consumed by the resampler. This is from resampler's perspective. However, the estimated frames cunsumed is calculated from the client-side's perspective, based on the output-sample-rate.
Would you mind explaining why the position is calculated from two different perspectives at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The played_frame we use is based on how much frames consumed by the resampler. This is from resampler's perspective. However, the estimated frames cunsumed is calculated from the client-side's perspective, based on the output-sample-rate.
played_frames
is the number of frames output by the resampler, in the client rate, not consumed: https://searchfox.org/mozilla-central/source/media/libcubeb/src/cubeb_resampler.h#59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me the calculation here is still mixed up with two different perspectives, even played_frames
is the output by the resampler in the client rate. It has two client-rates here. One is output from resampler, the other is sample rate (self.core_stream_data.output_stream_params.rate()
above).
In the above example, suppose audiounit_output_callback comes
at time 9580492198733
, and
- the resampler asks for 557 frames.
- The scheduled output time is
9580506217751
(from(*tstamp).mHostTime
inaudiounit_output_callback
)
The frames needed from time 9580492198733
to time 9580506217751
is (9580506217751 - 9580492198733) * 0.000048 = 673
frames, where 0.000048
is the sample-rate (frames / 1 ns), but the resampler only output 557
frames instead. That is, the frame-rate output by the resample is 557 / (9580506217751 - 9580492198733)
= 0.000039732
(frames / 1 ns) = 39732
(frames / 1 s)
Why the interpolated_frames
is based on the sample rate (48000
(frames / 1 s) in the above case), but the base frame (played_frame
) is based on the frames output by the resampler (39732
(frames / 1 s) in the above case)?
If the sample-rate is the correct rate that should be used to compute the position, could the position be computed by: (the time position API is called - the time the stream starts) * sample rate, instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could introduce an arithmetic error ... make the
interpolated_frames > frames_queued - frames_played
.
To confirm this issue, I add an assert!(interpolated_frames <= frames_queued - frames_played)
in ChunMinChang@26eceb0. The assertion would be hit when running cargo test test_dial_tone -- --nocapture
on my macbook pro 2017.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are the same rate.
Does this hold for the duplex stream? Or it's only true when the stream is output-only?
This function is not used for duplex streams. It's a property of the output stream.
I see the
target_sample_rate
is set to input's rate if the stream is duplex.I'd prefer to add the
interpolated_frames
when the stream is output-only if those rates are only the same when the stream is output-only.The system clock and the audio clock are not in the same domain, they drift apart.
Not sure if I understand correctly. So system clock is the clock used in the client-side domain, while the audio clock is the clock used in the audio-callback / resampler domain?
The system clock is for example mach_absolute_time()
. The audio clock is the clock created by summing the buffer size each callback. Both clock measure time. The first one is in one domain, the other one in another domain. They are not the same speed, depending on the configuration, one will be slower or faster than the other.
Let me summarize the discussion here. Suppose the audio stream is output-only.
played_frames is the number of frames output by the resampler, in the client rate, ..
So the original definition of the
fn position
is the frames output by the resampler. Let the definition be denoted by frames_resampler:frames_resampler = rate_resampler * time_resampler, where
* _rate_resampler_ is the _frames_ output by the _resampler_ within _1_ second * _time_resampler_ is the _time_ clocked by the _resampler_ (_audio clock_ ?), counted from callback to callback.
In this patch, the frames returned from
fn position
becomes: frames_resampler + interpolated_frames. The interpolated_frames is calculated by:interpolated_frames = rate_output-stream * time_client-side, where
* _rate_output-stream_ is the _sample rate_ requested from _client-side_ [here](https://searchfox.org/mozilla-central/rev/89814940895946b48b4c04c702efd2c676ec8e7e/media/libcubeb/include/cubeb.h#245) * _time_client-side_ is the _time_ clocked by the _client-side_ (_system clock_ ?), calculated by: _the client-side's timestamp at `fn position` is called - the client-side's timestamp at the latest audio callback_
So the frames returned from
fn position
becomes:frames_resampler + interpolated_frames = rate_resampler * time_resampler + rate_output-stream * time_client-side
Those are the same rate.
Since rate_resampler is same as rate_output-stream, the result of
fn position
is:R * time_resampler + R * time_client-side = R * (time_resampler + time_client-side),
where R = rate_output-stream = rate_resampler.The system clock and the audio clock are not in the same domain, they drift apart.
Suppose the drift between the client-side's clock and the resampler/audio's clock is fixed by d seconds , then client-side timestamp = resampler timestamp + d.
We don't need to suppose, we know this is not the case. A clock drift is not a fixed duration, it's just the fact that one clock is slightly faster or slower than the other. On my computer, it's about 90ms difference in 1h30, between counting the time via integrating over frames_queued
or frames_played
and mach_absolute_time()
. Please look at the link I provided, you'll see that, for example, playing 44100 frames at 44100Hz is not exactly one second. That's about 0.0016% difference between the system clock and the audio clock.
Therefore, time_client-side = (the resampler timestamp at
fn position
is called + d) - (the resampler timestamp at the latest audio callback + d) = the resampler timestamp atfn position
is called - the resampler timestamp at the latest audio callback, which leads the result equal to rate_resampler * (the resampler's time atfn position
is called - the resampler's time when audio starts). So the result meets the expectation: thefn position
is the frames output by the resampler, from audio starts to the timefn position
is called.
No it doesn't work in the long run, the system clock and the audio clock are not in the same domain.
I guess this is what this patch wants?
No, this patch interpolates the audio clock via the system clock, so that the resolution of the position()
is better.
However, one concern I have here is that the drift between the client-side's clock and the resampler/audio's clock is not fixed.
Therefore, time_client-side = (the resampler timestamp at
fn position
is called + x) - (the resampler timestamp at the latest audio callback + y), where x, y are the time drift between audio clock and system clock atfn position
is called and at the latest audio callback respectively, and x ≠ y. It could introduce an arithmetic error: (x-y) * rate_resampler frames and make theinterpolated_frames > frames_queued - frames_played
.
Of course it does, it depends on exactly when you call position()
compared to when frames_queued
and frames_played
are updated, it's always possible to call position
right in the middle of them being updated. As a side note I'm not convinced having two variables for this serves any purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are the same rate.
I prefer adding a comment for this, stating the output-rate is the same as the resampler's rate in the output-only case. An alternative is to add one function in resampler struct to return the resampler's rate.
Does this hold for the duplex stream? Or it's only true when the stream is output-only?
This function is not used for duplex streams. It's a property of the output stream.
It's better to add a output-only check then, if interpolated_frames
is only for output-only stream in the else
block, something like:
let base_frames = (frames_played - current_output_latency_frames)
if self.has_input() {
return base_frames;
}
let interpolated_frames = ...
base_frames + interpolated_frames
Otherwise, fn position()
will return an unexpected result if the stream is duplex. And it's clear the interpolation
is only for output-only stream (since resample-rate is the output-rate in this case).
Please look at the link I provided, you'll see that, for example, playing 44100 frames at 44100Hz is not exactly one second
Thanks for the detailed explanation and the plot! I did see that link and that's why I concluded those drifts can lead a result that makes interpolated_frames > frames_queued - frames_played
.
Of course it does, it depends on exactly when you call
position()
compared to whenframes_queued
andframes_played
are updated, it's always possible to callposition
right in the middle of them being updated. As a side note I'm not convinced having two variables for this serves any purpose.
If the frames_queued
or outframes
output from resampler
could be saved as a variable, then we could do something like:
if interpolated_frames > outframes {
interpolated_frames = outframes
}
or replace outframes
by frames_queued - frames_played
.
The benefit is that we can make sure the
frames_played + interpolated_frames
at time T_1 won't be over the frames_queued
at T_2, where T_2 > T_1.
Or at least adding some comments stating that the frames_played + interpolated_frames
is possibly larger than frames_queued
at least. It's natural to guess the interpolated_frames
is in range [0, frames_queued - frames_played] (inclusively) without reading the discussion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise,
fn position()
will return an unexpected result if the stream is duplex. And it's clear theinterpolation
is only for output-only stream (since resample-rate is the output-rate in this case).
How so? position
is always about the output stream, regardless of it's duplex or not. Also we're always in the resampled rate (which is always the same as the user rate). If we're doing a duplex streams, then input and output rate are the same, so I don't really understand the purpose of doing https://github.com/ChunMinChang/cubeb-coreaudio-rs/blob/4cdbceadae88c77dfc77183b7edda183f32cd28c/src/backend/mod.rs#L2809-L2814 compared to using the output rate. But it doesn't matter.
The same code in the windows backend makes a bit more sense, but the comment is wrong https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2188-L2196
The benefit is that we can make sure the
frames_played + interpolated_frames
at time T_1 won't be over theframes_queued
at T_2, where T_2 > T_1.
There is some value in keeping the clock monotonic, I agree. The test you wrote fails because we're updating those variables after resampling, but we're taking the timestamp before resampling, and resampling takes time (you can see for example that your assertion fails a lot less in release compared to debug). In any case we're clamping in other backends as well:
https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2612, so I'm adding that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so?
position
is always about the output stream, regardless of it's duplex or not. Also we're always in the resampled rate (which is always the same as the user rate). If we're doing a duplex streams, then input and output rate are the same, so I don't really understand the purpose of doinghttps://github.com/ChunMinChang/cubeb-coreaudio-rs/blob/4cdbceadae88c77dfc77183b7edda183f32cd28c/src/backend/mod.rs#L2809-L2814
compared to using the output rate. But it doesn't matter.The same code in the windows backend makes a bit more sense, but the comment is wrong https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2188-L2196
It comes from the C backend here when the code is rewritten. The original C implementation is in cubeb PR 78. If it's an unnecessary legacy, it's better to remove it so it's less confusing.
The input rate and output rate would be the same if they passed from cubeb.c since the rates are checked here. However, the rates could be different in the tests of this crate, so I guess it's necessary to open a PR to fix it (and add an assertion). (Update: I open #100)
BTW, if the input and output rates are the same. Is resampler still needed?
The benefit is that we can make sure the
frames_played + interpolated_frames
at time T_1 won't be over theframes_queued
at T_2, where T_2 > T_1.There is some value in keeping the clock monotonic, I agree. The test you wrote fails because we're updating those variables after resampling, but we're taking the timestamp before resampling, and resampling takes time (you can see for example that your assertion fails a lot less in release compared to debug). In any case we're clamping in other backends as well:
https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2612, so I'm adding that here.
Good to see prev_position
is adopted. It makes the position monotonically increasing. The benefit of using prev_position
instead of storing frames_queued
or outframes
is that no atomic
or mutex
is needed so probably it's a better solution.
4166e39
to
aa4be46
Compare
aa4be46
to
c162fe3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
We could probably remove the check for current_output_latency_frames != 0
so the interpolation also works when current_output_latency_frames == 0
.
Please run Cargo fmt
so the tests can be run.
position | ||
} else { frames_played }; | ||
|
||
// Ensure mononicity of the clock even when changing output device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is better.
Looks like this is ready to go in, once the rustfmt failures are addressed? I've also just landed bug 1578042, so we should be seeing a difference in the number of dropped frames once this lands, for example in the YouTube "Stats for nerds" overlay. Thanks for driving this to completion! I have two additional issues with the way this works, but I'll file those as follow-up bugs. (They're about using the output timestamp rather than the time of the invocation of the audio callback, and about synchronizing the updates to the different atomics. I'll add more detail in the bugs.) |
No we can't. If we do this, and the stream take a rather long time to start, then we'll interpolate while the stream is not started yet and then when we start getting callbacks, and because we have this enforcement of monotonicity, the clock will stall for some time. |
I'm not happy either. I'm testing various other versions. |
I filed the issue about the synchronized update as bug 1654151, and listed the other two issues in this comment. |
This could happen even after applying the current change I guess? If To make sure the position always returns |
No. |
|
Ok. Then it's another bug. All this is too complicated. |
|
c162fe3
to
8660cc7
Compare
@ChunMinChang can you have another look. This is based on #101, and uses a way to send both number atomically proposed by @mstange, but this is split for easier reviewing compared to his version, and it's rebased on the latest version of this branch. |
e691527
to
c002edc
Compare
Credits goes to <[email protected]> for the wait-free thread communication, this is just a cleanup/rebase/rename.
c002edc
to
b8fc04d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, thanks!
Some minor fixes:
frames_played
is replaced byoutput_callback_timing_data_read
so it should be removed from AudioUnitStream.- It's better to call
mach_absolute_time()
only once inaudiounit_output_callback
All other cubeb backend return, from
cubeb_stream_get_position
, something that is interpolated using a system clock, from the callback times.This is a problem if we're using big buffers for playing the audio out, such as in Firefox in the media playback case (i.e. not Web Audio API/WebRTC, etc.): it means the resolution of the audio clock is not very high, and this can cause problem when scheduling video frames.
It seems that in the current state of things we're saved by the fact that on macOS, buffer sizes are small (I think because otherwise the clock resolution is not very high and this causes issues) and so this is not terrible regardless of if we need it to fix the issues @mstange is fixing in BMO#1649859: it means we'll be able to increase the buffer size for playback content, and potentially save some CPU or battery, by having the audio output callback called less often.