Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform client side interpolation on macOS for cubeb_stream_get_position #99

Merged
merged 3 commits into from
Jul 24, 2020

Conversation

padenot
Copy link
Collaborator

@padenot padenot commented Jul 2, 2020

All other cubeb backend return, from cubeb_stream_get_position, something that is interpolated using a system clock, from the callback times.

This is a problem if we're using big buffers for playing the audio out, such as in Firefox in the media playback case (i.e. not Web Audio API/WebRTC, etc.): it means the resolution of the audio clock is not very high, and this can cause problem when scheduling video frames.

It seems that in the current state of things we're saved by the fact that on macOS, buffer sizes are small (I think because otherwise the clock resolution is not very high and this causes issues) and so this is not terrible regardless of if we need it to fix the issues @mstange is fixing in BMO#1649859: it means we'll be able to increase the buffer size for playback content, and potentially save some CPU or battery, by having the audio output callback called less often.

@padenot padenot force-pushed the client-side-interpolation branch from bd87193 to 4166e39 Compare July 2, 2020 18:13
@padenot padenot requested a review from ChunMinChang July 3, 2020 14:02
cargo-count.dot Outdated
@@ -0,0 +1,102 @@
digraph dependencies {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it?

I guess it's something for other purposes. Could it be added in another PR if it's not necessary part to fix this issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mistake, please disregard.

let now = unsafe { mach_absolute_time() };
let diff = now - self.output_callback_tstamp.load(Ordering::SeqCst);
let frames = (host_time_to_ns(diff)
* self.core_stream_data.output_stream_params.rate() as u64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns here:

  1. Why output_stream_params.rate() is chosen over output_hw_rate? output_hw_rate is retrieved from output_unit and audiounit_output_callback is tied to output_unit. I wonder if using output_hw_rate can get better accuracy.
  2. The frames here must be less than frames_queued - frames_played, which is the frames the resampler needs from the previous callback to the next callback. It's better to add an assertion to guarantee we won't jump to the future.
  3. output_stream_params.rate() is the frames that will be consumed within 1 second, in the ideal case. However, the frames that will be consumed in reality are calculated by the resampler. Is it better to use the outframes and time-diff to calculate the frame-diff here instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. External cubeb APIs only even surface the user rate, never the hardware rate, because the hardware rate is unknown to the user and can change at any time. Here, we have a number of nanoseconds we want to convert in frames, at the user rate.
  2. In practice this can never go much higher than the output callback buffer size, by construction. It can go higher than the audio output buffer callback if the audio output callback is delayed but that would be a bug.
  3. It's what this is doing? frame_played comes from frames_queued, which is an integration over the number of frames the resampler produces, so it's in the user rate. We use time rather than frame because that's how client-side interpolation work. I don't think I understand what you're asking.

Copy link
Member

@ChunMinChang ChunMinChang Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try explaining 3 in another way.

Assume the output rate is 48000. (48000 frames / second = 0.000048 / nanosecond).

Suppose audiounit_output_callback comes at time 9580492198733, and

  • the resampler asks for 557 frames.
  • The scheduled output time is 9580506217751 (from (*tstamp).mHostTime in audiounit_output_callback)

From client-side's perspective, we may expect (scheduled-output-time - current-time) * sample-rate =(9580506217751 - 9580492198733) * 0.000048 = 673 frames would be consumed since the output-rate is 48000. But from the resampler's perspective, only 557 would be consumed instead.

Now, if we try to get the position at time 9580503767252.
The time-diff calculated here is (current time - output callback timestamp) = 9580503767252 - 9580492198733 = 11568519 (ns), and the frame-diff calculated in this patch is 11568519 * 0.000048 = 555 frames. (It's almost all the frames asked by the resampler in the previous callback) This calculation is based on client-side's perspective.

From resampler's perspective, if the frames are consumed linearly during a period,
we may only consume (current time - output callback timestamp) / (scheduled output time - output callback timestamp) * frames-comsumed-by-resampler-in-a-period = (9580503767252 - 9580492198733) / (9580506217751 - 9580492198733) * 557 = (11568519 / 14019018) * 557 = 460 frames.

So here is what I confused: The position calculation is estimated from two different perspectives at the same time, after applying this change.

The position is now calculated by played_frame + estimated frames cunsumed.

The played_frame we use is based on how much frames consumed by the resampler. This is from resampler's perspective. However, the estimated frames cunsumed is calculated from the client-side's perspective, based on the output-sample-rate.

Would you mind explaining why the position is calculated from two different perspectives at the same time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The played_frame we use is based on how much frames consumed by the resampler. This is from resampler's perspective. However, the estimated frames cunsumed is calculated from the client-side's perspective, based on the output-sample-rate.

played_frames is the number of frames output by the resampler, in the client rate, not consumed: https://searchfox.org/mozilla-central/source/media/libcubeb/src/cubeb_resampler.h#59

Copy link
Member

@ChunMinChang ChunMinChang Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the calculation here is still mixed up with two different perspectives, even played_frames is the output by the resampler in the client rate. It has two client-rates here. One is output from resampler, the other is sample rate (self.core_stream_data.output_stream_params.rate() above).

In the above example, suppose audiounit_output_callback comes at time 9580492198733, and

  • the resampler asks for 557 frames.
  • The scheduled output time is 9580506217751 (from (*tstamp).mHostTime in audiounit_output_callback)

The frames needed from time 9580492198733 to time 9580506217751 is (9580506217751 - 9580492198733) * 0.000048 = 673 frames, where 0.000048 is the sample-rate (frames / 1 ns), but the resampler only output 557 frames instead. That is, the frame-rate output by the resample is 557 / (9580506217751 - 9580492198733) = 0.000039732 (frames / 1 ns) = 39732 (frames / 1 s)

Why the interpolated_frames is based on the sample rate (48000 (frames / 1 s) in the above case), but the base frame (played_frame) is based on the frames output by the resampler (39732 (frames / 1 s) in the above case)?

If the sample-rate is the correct rate that should be used to compute the position, could the position be computed by: (the time position API is called - the time the stream starts) * sample rate, instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could introduce an arithmetic error ... make the interpolated_frames > frames_queued - frames_played.

To confirm this issue, I add an assert!(interpolated_frames <= frames_queued - frames_played) in ChunMinChang@26eceb0. The assertion would be hit when running cargo test test_dial_tone -- --nocapture on my macbook pro 2017.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the same rate.

Does this hold for the duplex stream? Or it's only true when the stream is output-only?

This function is not used for duplex streams. It's a property of the output stream.

https://github.com/ChunMinChang/cubeb-coreaudio-rs/blob/4cdbceadae88c77dfc77183b7edda183f32cd28c/src/backend/mod.rs#L2809-L2814

I see the target_sample_rate is set to input's rate if the stream is duplex.

I'd prefer to add theinterpolated_frames when the stream is output-only if those rates are only the same when the stream is output-only.

The system clock and the audio clock are not in the same domain, they drift apart.

Not sure if I understand correctly. So system clock is the clock used in the client-side domain, while the audio clock is the clock used in the audio-callback / resampler domain?

The system clock is for example mach_absolute_time(). The audio clock is the clock created by summing the buffer size each callback. Both clock measure time. The first one is in one domain, the other one in another domain. They are not the same speed, depending on the configuration, one will be slower or faster than the other.

Let me summarize the discussion here. Suppose the audio stream is output-only.

played_frames is the number of frames output by the resampler, in the client rate, ..

So the original definition of the fn position is the frames output by the resampler. Let the definition be denoted by frames_resampler:

frames_resampler = rate_resampler * time_resampler, where

* _rate_resampler_ is the _frames_ output by the _resampler_ within _1_ second

* _time_resampler_ is the _time_ clocked by the _resampler_ (_audio clock_ ?), counted from callback to callback.

In this patch, the frames returned from fn position becomes: frames_resampler + interpolated_frames. The interpolated_frames is calculated by:

interpolated_frames = rate_output-stream * time_client-side, where

* _rate_output-stream_ is the _sample rate_ requested from _client-side_ [here](https://searchfox.org/mozilla-central/rev/89814940895946b48b4c04c702efd2c676ec8e7e/media/libcubeb/include/cubeb.h#245)

* _time_client-side_ is the _time_ clocked by the _client-side_ (_system clock_ ?), calculated by: _the client-side's timestamp at `fn position` is called - the client-side's timestamp at the latest audio callback_

So the frames returned from fn position becomes:

frames_resampler + interpolated_frames = rate_resampler * time_resampler + rate_output-stream * time_client-side

Those are the same rate.

Since rate_resampler is same as rate_output-stream, the result of fn position is:

R * time_resampler + R * time_client-side = R * (time_resampler + time_client-side),
where R = rate_output-stream = rate_resampler.

The system clock and the audio clock are not in the same domain, they drift apart.

Suppose the drift between the client-side's clock and the resampler/audio's clock is fixed by d seconds , then client-side timestamp = resampler timestamp + d.

We don't need to suppose, we know this is not the case. A clock drift is not a fixed duration, it's just the fact that one clock is slightly faster or slower than the other. On my computer, it's about 90ms difference in 1h30, between counting the time via integrating over frames_queued or frames_played and mach_absolute_time(). Please look at the link I provided, you'll see that, for example, playing 44100 frames at 44100Hz is not exactly one second. That's about 0.0016% difference between the system clock and the audio clock.

Therefore, time_client-side = (the resampler timestamp at fn position is called + d) - (the resampler timestamp at the latest audio callback + d) = the resampler timestamp at fn position is called - the resampler timestamp at the latest audio callback, which leads the result equal to rate_resampler * (the resampler's time at fn position is called - the resampler's time when audio starts). So the result meets the expectation: the fn position is the frames output by the resampler, from audio starts to the time fn position is called.

No it doesn't work in the long run, the system clock and the audio clock are not in the same domain.

I guess this is what this patch wants?

No, this patch interpolates the audio clock via the system clock, so that the resolution of the position() is better.

However, one concern I have here is that the drift between the client-side's clock and the resampler/audio's clock is not fixed.

Therefore, time_client-side = (the resampler timestamp at fn position is called + x) - (the resampler timestamp at the latest audio callback + y), where x, y are the time drift between audio clock and system clock at fn position is called and at the latest audio callback respectively, and x ≠ y. It could introduce an arithmetic error: (x-y) * rate_resampler frames and make the interpolated_frames > frames_queued - frames_played.

Of course it does, it depends on exactly when you call position() compared to when frames_queued and frames_played are updated, it's always possible to call position right in the middle of them being updated. As a side note I'm not convinced having two variables for this serves any purpose.

Copy link
Member

@ChunMinChang ChunMinChang Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the same rate.

I prefer adding a comment for this, stating the output-rate is the same as the resampler's rate in the output-only case. An alternative is to add one function in resampler struct to return the resampler's rate.

Does this hold for the duplex stream? Or it's only true when the stream is output-only?

This function is not used for duplex streams. It's a property of the output stream.

It's better to add a output-only check then, if interpolated_frames is only for output-only stream in the else block, something like:

let base_frames = (frames_played - current_output_latency_frames)
if self.has_input() {
  return base_frames;
}
let interpolated_frames = ...
base_frames + interpolated_frames 

Otherwise, fn position() will return an unexpected result if the stream is duplex. And it's clear the interpolation is only for output-only stream (since resample-rate is the output-rate in this case).

Please look at the link I provided, you'll see that, for example, playing 44100 frames at 44100Hz is not exactly one second

Thanks for the detailed explanation and the plot! I did see that link and that's why I concluded those drifts can lead a result that makes interpolated_frames > frames_queued - frames_played.

Of course it does, it depends on exactly when you call position() compared to when frames_queued and frames_played are updated, it's always possible to call position right in the middle of them being updated. As a side note I'm not convinced having two variables for this serves any purpose.

If the frames_queued or outframes output from resampler could be saved as a variable, then we could do something like:

if interpolated_frames > outframes {
  interpolated_frames = outframes
}

or replace outframes by frames_queued - frames_played.

The benefit is that we can make sure the
frames_played + interpolated_frames at time T_1 won't be over the frames_queued at T_2, where T_2 > T_1.

Or at least adding some comments stating that the frames_played + interpolated_frames is possibly larger than frames_queued at least. It's natural to guess the interpolated_frames is in range [0, frames_queued - frames_played] (inclusively) without reading the discussion here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, fn position() will return an unexpected result if the stream is duplex. And it's clear the interpolation is only for output-only stream (since resample-rate is the output-rate in this case).

How so? position is always about the output stream, regardless of it's duplex or not. Also we're always in the resampled rate (which is always the same as the user rate). If we're doing a duplex streams, then input and output rate are the same, so I don't really understand the purpose of doing https://github.com/ChunMinChang/cubeb-coreaudio-rs/blob/4cdbceadae88c77dfc77183b7edda183f32cd28c/src/backend/mod.rs#L2809-L2814 compared to using the output rate. But it doesn't matter.

The same code in the windows backend makes a bit more sense, but the comment is wrong https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2188-L2196

The benefit is that we can make sure the
frames_played + interpolated_frames at time T_1 won't be over the frames_queued at T_2, where T_2 > T_1.

There is some value in keeping the clock monotonic, I agree. The test you wrote fails because we're updating those variables after resampling, but we're taking the timestamp before resampling, and resampling takes time (you can see for example that your assertion fails a lot less in release compared to debug). In any case we're clamping in other backends as well:
https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2612, so I'm adding that here.

Copy link
Member

@ChunMinChang ChunMinChang Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? position is always about the output stream, regardless of it's duplex or not. Also we're always in the resampled rate (which is always the same as the user rate). If we're doing a duplex streams, then input and output rate are the same, so I don't really understand the purpose of doing

https://github.com/ChunMinChang/cubeb-coreaudio-rs/blob/4cdbceadae88c77dfc77183b7edda183f32cd28c/src/backend/mod.rs#L2809-L2814
compared to using the output rate. But it doesn't matter.

The same code in the windows backend makes a bit more sense, but the comment is wrong https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2188-L2196

It comes from the C backend here when the code is rewritten. The original C implementation is in cubeb PR 78. If it's an unnecessary legacy, it's better to remove it so it's less confusing.

The input rate and output rate would be the same if they passed from cubeb.c since the rates are checked here. However, the rates could be different in the tests of this crate, so I guess it's necessary to open a PR to fix it (and add an assertion). (Update: I open #100)

BTW, if the input and output rates are the same. Is resampler still needed?

The benefit is that we can make sure the
frames_played + interpolated_frames at time T_1 won't be over the frames_queued at T_2, where T_2 > T_1.

There is some value in keeping the clock monotonic, I agree. The test you wrote fails because we're updating those variables after resampling, but we're taking the timestamp before resampling, and resampling takes time (you can see for example that your assertion fails a lot less in release compared to debug). In any case we're clamping in other backends as well:
https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2612, so I'm adding that here.

Good to see prev_position is adopted. It makes the position monotonically increasing. The benefit of using prev_position instead of storing frames_queued or outframes is that no atomic or mutex is needed so probably it's a better solution.

@padenot padenot force-pushed the client-side-interpolation branch from 4166e39 to aa4be46 Compare July 8, 2020 08:57
@padenot padenot force-pushed the client-side-interpolation branch from aa4be46 to c162fe3 Compare July 16, 2020 15:39
Copy link
Member

@ChunMinChang ChunMinChang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

We could probably remove the check for current_output_latency_frames != 0 so the interpolation also works when current_output_latency_frames == 0.

Please run Cargo fmt so the tests can be run.

position
} else { frames_played };

// Ensure mononicity of the clock even when changing output device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is better.

@mstange
Copy link
Contributor

mstange commented Jul 20, 2020

Looks like this is ready to go in, once the rustfmt failures are addressed?

I've also just landed bug 1578042, so we should be seeing a difference in the number of dropped frames once this lands, for example in the YouTube "Stats for nerds" overlay.

Thanks for driving this to completion!

I have two additional issues with the way this works, but I'll file those as follow-up bugs. (They're about using the output timestamp rather than the time of the invocation of the audio callback, and about synchronizing the updates to the different atomics. I'll add more detail in the bugs.)

@padenot
Copy link
Collaborator Author

padenot commented Jul 21, 2020

We could probably remove the check for current_output_latency_frames != 0 so the interpolation also works when current_output_latency_frames == 0.

No we can't. If we do this, and the stream take a rather long time to start, then we'll interpolate while the stream is not started yet and then when we start getting callbacks, and because we have this enforcement of monotonicity, the clock will stall for some time.

@padenot
Copy link
Collaborator Author

padenot commented Jul 21, 2020

I have two additional issues with the way this works, but I'll file those as follow-up bugs. (They're about using the output timestamp rather than the time of the invocation of the audio callback, and about synchronizing the updates to the different atomics. I'll add more detail in the bugs.)

I'm not happy either. I'm testing various other versions.

@mstange
Copy link
Contributor

mstange commented Jul 21, 2020

I have two additional issues with the way this works, but I'll file those as follow-up bugs. (They're about using the output timestamp rather than the time of the invocation of the audio callback, and about synchronizing the updates to the different atomics. I'll add more detail in the bugs.)

I filed the issue about the synchronized update as bug 1654151, and listed the other two issues in this comment.

@ChunMinChang
Copy link
Member

No we can't. If we do this, and the stream take a rather long time to start, then we'll interpolate while the stream is not started yet and then when we start getting callbacks, and because we have this enforcement of monotonicity, the clock will stall for some time.

This could happen even after applying the current change I guess?

If current_output_latency_frames > 0 but the callback isn't started, the position will return something larger than 0 now.

To make sure the position always returns 0 before the callback is started. Using frame_played > 0 is a better option. current_output_latency_frames != 0 doesn't mean the callback is started. It's set at the same time when initializing the output-side settings.

@padenot
Copy link
Collaborator Author

padenot commented Jul 21, 2020

This could happen even after applying the current change I guess?

No. current_output_latency_frames is only ever set in the audio callback.

@ChunMinChang
Copy link
Member

This could happen even after applying the current change I guess?

No. current_output_latency_frames is only ever set in the audio callback.

current_output_latency_frames is set in CoreStreamData::setup

https://github.com/ChunMinChang/cubeb-coreaudio-rs/blob/4cdbceadae88c77dfc77183b7edda183f32cd28c/src/backend/mod.rs#L2859-L2862

total_output_latency_frames is another one that is set in the audiounit_output_callback.

@padenot
Copy link
Collaborator Author

padenot commented Jul 21, 2020

Ok. Then it's another bug. All this is too complicated.

@padenot
Copy link
Collaborator Author

padenot commented Jul 21, 2020

Ok. Then it's another bug. All this is too complicated.

https://github.com/ChunMinChang/cubeb-coreaudio-rs/pull/101

@padenot padenot force-pushed the client-side-interpolation branch from c162fe3 to 8660cc7 Compare July 22, 2020 16:41
@padenot
Copy link
Collaborator Author

padenot commented Jul 22, 2020

@ChunMinChang can you have another look. This is based on #101, and uses a way to send both number atomically proposed by @mstange, but this is split for easier reviewing compared to his version, and it's rebased on the latest version of this branch.

@padenot padenot force-pushed the client-side-interpolation branch 2 times, most recently from e691527 to c002edc Compare July 23, 2020 12:05
padenot added 2 commits July 23, 2020 14:27
Credits goes to <[email protected]> for the wait-free thread
communication, this is just a cleanup/rebase/rename.
@padenot padenot force-pushed the client-side-interpolation branch from c002edc to b8fc04d Compare July 23, 2020 12:27
@padenot padenot requested a review from ChunMinChang July 23, 2020 12:30
Copy link
Member

@ChunMinChang ChunMinChang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, thanks!

Some minor fixes:

  1. frames_played is replaced by output_callback_timing_data_read so it should be removed from AudioUnitStream.
  2. It's better to call mach_absolute_time() only once in audiounit_output_callback

src/backend/mod.rs Show resolved Hide resolved
src/backend/mod.rs Outdated Show resolved Hide resolved
src/backend/mod.rs Show resolved Hide resolved
src/backend/mod.rs Show resolved Hide resolved
src/backend/mod.rs Show resolved Hide resolved
src/backend/mod.rs Show resolved Hide resolved
@padenot padenot merged commit 67e468a into mozilla:trailblazer Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants