-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
11 changed files
with
233 additions
and
108 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
--- | ||
layout: "@/layouts/global.astro" | ||
title: Forward? Error? Correction? | ||
author: kixelated | ||
description: Concealing packet loss is harder than you think. | ||
cover: "/blog/forward-error-correction/mfw.jpeg" | ||
date: 2024-02-17 | ||
--- | ||
|
||
# Forward? Error? Correction? | ||
So I absolutely *dunked* on datagrams in the [last blog post](/blog/never-use-datagrams). | ||
Now it's time to dunk on the last remaining hope for datagrams: [Forward Error Correction](https://www.techtarget.com/searchmobilecomputing/definition/forward-error-correction) (FEC) | ||
|
||
## OPUS | ||
[Opus](https://opus-codec.org/) is an amazing audio codec. | ||
Full disclosure, I haven't had the opportunity to work with it directly, since I was stuck in [AAC](https://en.wikipedia.org/wiki/Advanced_Audio_Coding) land at Twitch, so I'm talking out of my ass a bit. | ||
|
||
OPUS has built-in support for FEC which is neat. | ||
There are so many possible FEC schemes, many of which are patented, and I would do the subject a disservice if I tried to explain them. | ||
The idea is to send redundant data, so the receiver can paper over small amounts of packet loss. | ||
It's conceptually similar to [RAID](https://en.wikipedia.org/wiki/RAID) but for packets spread over time instead of hard drives. | ||
|
||
<figure> | ||
![BBR](/blog/forward-error-correction/interleaving.png) | ||
<figcaption> | ||
[Source](https://en.wikipedia.org/wiki/Error_correction_code): | ||
Instead you get this real image from Wikipedia. | ||
</figcaption> | ||
</figure> | ||
|
||
Conveniently, audio "frames" are so small that they fit into a single datagram. | ||
So rather than deal with retransmissions at the disgusting transport layer, the encoder can just encode redundancy into the bitstream. | ||
RIP packet loss 1983-2024. | ||
|
||
But the audio codec is so the wrong layer for this. | ||
|
||
## Networks are Complicated | ||
I worked with some very smart people at [Twitch](https://www.twitch.tv/). | ||
However, I will never forget a presentation maybe 4 years ago where a very smart codec engineer pitched using FEC. | ||
|
||
There was a graph that showed the TCP throughput during random packet loss. | ||
Wow, TCP sure has a low birate at 30% packet loss. | ||
But look at this demo, we used UDP+FEC and made something faster than TCP! | ||
|
||
If somebody shows you any results based on simulated, random packet loss, you should politely tell them: **no, that's not how the internet works**. | ||
|
||
<figure> | ||
![Series of Tubes](/blog/forward-error-correction/tubes.png) | ||
<figcaption> | ||
**Fun fact**: the internet is not a series of tubes. | ||
</figcaption> | ||
</figure> | ||
|
||
Networking is not quantum mechanics. | ||
There are no dice involved and packet loss is *not random*. | ||
It depends on the underlying transport. | ||
|
||
Sometimes it occurs randomly due to signal interference. <br /> | ||
Sometimes it occurs in bursts due to batching. <br /> | ||
Sometimes it occurs due to congestion. <br /> | ||
Sometimes it occurs because ???. | ||
|
||
Unfortunately, there's no magic loophole on the interenet. | ||
You can't send 10x the data to mask packet loss. | ||
|
||
In fact, if you ever see a number like 30% packet loss in the real world (yikes), it's likely due to congestion. | ||
You're sending 30% *too much* and fully saturating a link. | ||
The solution is to send *less* instead to drain network queues. 🤯 | ||
|
||
**Fun-fact**: That's the fundamental difference between loss-based congestion control (ex. Reno, CUBIC) and delay-based congestion control (ex. BBR, COPA). | ||
BBRv1 doesn't even use packet loss as a signal; it's all about RTT. | ||
|
||
## Expertise | ||
These packet loss misconceptions come up surprisingly often in the live video space. | ||
The hyper focus on packet loss is a symptom of a larger problem: media experts suddenly have to become networking experts. | ||
|
||
Even modern media protocols are built directly on top of UDP, for example [WebRTC](https://webrtc.org/), [SRT](https://www.haivision.com/products/srt-secure-reliable-transport/), [Sye](https://nscreenmedia.com/amazon-buys-sye/), [RIST](https://www.rist.tv/). | ||
It's for a good reason, as the head-of-line blocking of TCP is a non-starter for real-time media. | ||
But with great power (UDP) comes great responsibility. | ||
|
||
<figure> | ||
![mfw](/blog/forward-error-correction/mfw.jpeg) | ||
<figcaption> | ||
[\> mfw](https://knowyourmeme.com/memes/im-going-to-die-spider-man-3-qte) a new protocol over UDP is announced. | ||
</figcaption> | ||
</figure> | ||
|
||
And the same mistakes keep getting repeated. | ||
I can't tell you the number of times I've talked to an engineer at a video conference who decries congestion control, and in the next breath claims FEC is the solution to all their problems. | ||
WebRTC gets a passing grade because the Google engineers are super smart, although its complexity is a testiment to the difficulty of networking. | ||
|
||
This is one of the reasons why we need **Media over QUIC**. | ||
Let the network engineers handle the network and the media engineers handle the media. | ||
|
||
## End-to-End | ||
But my beef with FEC in OPUS is more fundamental. | ||
|
||
When I speak into a microphone, the audio data is encoded into packet via a codec like OPUS. | ||
That packet then traverses multiple multiple hops, potentially going over WiFi, Ethernet, 4G, fiber, satellites, etc. | ||
It switches between different cell towers, routers, ISPs, transit providers, business units, and who knows what else. | ||
Until finally, finally, the packet reaches ur Mom's iPhone and my words replay into her ear. | ||
Tell her I miss her. 😢 | ||
|
||
Unfortuantely, each of those hops have different properties and packet loss scenarios. | ||
Many of them already have FEC built-in or don't need it at all. | ||
|
||
By performing FEC in the application layer, specifically the audio codec, we're making a decision that's **end-to-end**. | ||
It's suboptimal by definition because packet loss is a **hop-by-hop** property. | ||
|
||
## Hop-by-Hop | ||
If not the audio codec, where should we perform FEC instead? | ||
|
||
In my ideal world, each hop uses a tailored loss recovery mechanism. | ||
This is based on the properties of the hop, and if they expect: | ||
- **burst loss**: delayed parity. | ||
- **random loss**: interleaved parity. | ||
- **low RTT**: retransmit packets. | ||
- **congestion**: drop packets. | ||
|
||
But at which layer? | ||
A protocol like WiFi is general purpose, so it uses a general purpose recovery mechanism. | ||
It doesn't know that audio packets are time-sensitive. | ||
|
||
There are ways to flag [QoS](https://en.wikipedia.org/wiki/Quality_of_service) in IP packets but ISP support is limited, as is the granularity. | ||
That's why it why does make sense to perform additional FEC at a higher level, but again it should be hop-by-hop. | ||
|
||
## QUIC | ||
So I just dunked on FEC in OPUS. | ||
"Don't do FEC in the audio codec, do it in QUIC instead." | ||
|
||
Well QUIC doesn't support FEC yet. | ||
There are [some proposals](https://www.ietf.org/archive/id/draft-michel-quic-fec-01.html) but I imagine it will be a long time before anything materializes. | ||
|
||
QUIC is primarily designed and used by CDN companies. | ||
Their whole purpose is put edge nodes as close to the user as possible in order to improve the user experience. | ||
When your RTT to the Google/CloudFlare/Akamai/Fastly/etc edge is 20ms, then FEC is strictly worse than retransmissions. | ||
FEC can only ever be an improvement when `target_latency < 2*RTT`. | ||
|
||
Additionally, there might not even be a need for FEC in QUIC. | ||
WebRTC supports [RED](https://webrtchacks.com/red-improving-audio-quality-with-redundancy/) which was [added to RTP in 1997](https://datatracker.ietf.org/doc/html/rfc2198). | ||
Why send parity data when you can just transmit the same packet multiple times? | ||
It's wasteful but it's simple. | ||
|
||
RED actually works natively in QUIC without any extensions. | ||
A QUIC library can send redundant [STREAM frames](https://www.rfc-editor.org/rfc/rfc9000.html#name-stream-frames) and the receiver will transparently discard duplicates. | ||
This might just be good enough for now. | ||
|
||
## Conclusion | ||
Audio is important. <br /> | ||
Networks are quite complicated.<br /> | ||
This is not haiku. | ||
|
||
FEC should not be in an audio codec. | ||
It should be closer to the source of packet loss . | ||
But at the end of the day, I'm just shoving blame down the stack. | ||
Do what works best for your users at whatever layer you have access to. | ||
|
||
Just please, never show me results based on random packet loss again. | ||
|
||
Written by [@kixelated](https://github.com/kixelated). | ||
<img src="/blog/kixelCat.png" class="inline w-16" /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.