You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem
From my reading of the code, and logging the PTS values of samples fed into the MediaTarget, it appears that samples are fetch from each track sequentially (round robin). This results in a stream similar to "A/V/A/V/A/V... etc". However, this doesn't take into account that the duration of each frame (aka sample) is likely to be very different. For a video that has a frame rate of 30fps, a single video frame will likely be presented while multiple audio frames are rendered via the AudioTrack.
This behaviour results in the audio and video samples that should be presented at the same time being sparsely spread throughout the output file. The default Android muxer (or other open source muxers) will be able to fix the interleaving. For small files, this is likely to not be that noticeable, but would require those players to seek back and forth within the file. The issue will become more noticeable as the output file grows (it's worth noting that ExoPlayer has different buffering logic per track compared to other video players). If we wanted to support a segmented output file, this behaviour is also problematic. We need each segment to contain the same duration for every contained stream.
Proposed Solution
One possible solution could be:
Modify TrackTranscoder to report the PTS of the last frame processed
Modify TransformationJob.processNextFrame to be smarter about which TrackTranscoder to process next. We could define a max duration that streams are allowed to written ahead. If we observe that a stream (e.g. video) is now beyond that, we will continue to process the Audio track until it's caught up.
Problem
From my reading of the code, and logging the PTS values of samples fed into the
MediaTarget
, it appears that samples are fetch from each track sequentially (round robin). This results in a stream similar to "A/V/A/V/A/V... etc". However, this doesn't take into account that the duration of each frame (aka sample) is likely to be very different. For a video that has a frame rate of 30fps, a single video frame will likely be presented while multiple audio frames are rendered via the AudioTrack.This behaviour results in the audio and video samples that should be presented at the same time being sparsely spread throughout the output file. The default Android muxer (or other open source muxers) will be able to fix the interleaving. For small files, this is likely to not be that noticeable, but would require those players to seek back and forth within the file. The issue will become more noticeable as the output file grows (it's worth noting that ExoPlayer has different buffering logic per track compared to other video players). If we wanted to support a segmented output file, this behaviour is also problematic. We need each segment to contain the same duration for every contained stream.
Proposed Solution
One possible solution could be:
TrackTranscoder
to report the PTS of the last frame processedTransformationJob.processNextFrame
to be smarter about whichTrackTranscoder
to process next. We could define a max duration that streams are allowed to written ahead. If we observe that a stream (e.g. video) is now beyond that, we will continue to process the Audio track until it's caught up.I took a look at Google's Media3 / Transformer and looks like they're doing something very similar: https://github.com/androidx/media/blob/main/libraries/transformer/src/main/java/androidx/media3/transformer/MuxerWrapper.java#L57
The text was updated successfully, but these errors were encountered: