What is a EncodedVideoChunk or a EncodedAudioChunk? #829

erwanvivien · 2024-08-27T13:14:48Z

erwanvivien
Aug 27, 2024

My app does this:

Read .wav
Chunk it into slices and create AudioData out of it
I pass those to audioEncoder.encode()
Then in AudioEncoder["output"] I have access to an EncodedAudioChunk
this EncodedAudioChunk I provide to my muxer

All works... Except I don't understand what is an EncodedAudioChunk and same goes for the equivalent for Video (EncodedVideoChunk)

Could someone explain?

PS: Sub question, can I take an mp3 file, chunk it (let's say blocks of 1024 bytes) and call those chunks a EncodedAudioChunk?

Answered by padenot

Aug 27, 2024

Encoded{Audio,Video}Chunk are sequences of bytes that are compressed, and correspond precisely to a particular format, for a particular codec. Those are often also called packets. For example, for the Opus audio codec, the specification is described here: https://datatracker.ietf.org/doc/html/rfc6716#section-3.

For video, a packet corresponds to a single image (compressed), for audio, it corresponds to a number of audio samples (in the order of hundreds to a few thousands, typically).

Wav (PCM) is particular because it's not really compressed: as long as you take a group of bytes that has a size that is a multiple of both the sample size (e.g. 2 for 16-bits audio), multiplied by the numbe…

View full answer

padenot · 2024-08-27T13:32:53Z

padenot
Aug 27, 2024
Maintainer

Encoded{Audio,Video}Chunk are sequences of bytes that are compressed, and correspond precisely to a particular format, for a particular codec. Those are often also called packets. For example, for the Opus audio codec, the specification is described here: https://datatracker.ietf.org/doc/html/rfc6716#section-3.

For video, a packet corresponds to a single image (compressed), for audio, it corresponds to a number of audio samples (in the order of hundreds to a few thousands, typically).

Wav (PCM) is particular because it's not really compressed: as long as you take a group of bytes that has a size that is a multiple of both the sample size (e.g. 2 for 16-bits audio), multiplied by the number of channels, it's going to be a valid packet. All other codecs aren't like that. This means that for mp3, you very much can't divide a byte stream arbitrarily: you need to respect the mp3 framing, for example by demuxing it. Demuxing means transforming a media byte stream into a series of packets (called chunks in the Web Codecs API), sometimes preceded by a header containing some metadata.

1 reply

erwanvivien Aug 28, 2024
Author

Thanks a lot padenot, the explanation was very clear! I think I missed the following part to my understanding

correspond precisely to a particular format

So if I want to pass MP3 packets to my muxer I would need to demux first to extract those packets, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is a EncodedVideoChunk or a EncodedAudioChunk? #829

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

What is a EncodedVideoChunk or a EncodedAudioChunk? #829

erwanvivien Aug 27, 2024

Replies: 1 comment · 1 reply

padenot Aug 27, 2024 Maintainer

erwanvivien Aug 28, 2024 Author

erwanvivien
Aug 27, 2024

Replies: 1 comment 1 reply

padenot
Aug 27, 2024
Maintainer

erwanvivien Aug 28, 2024
Author