Skip to content

Commit

Permalink
FEC blog (#89)
Browse files Browse the repository at this point in the history
  • Loading branch information
kixelated authored Feb 18, 2024
1 parent 7745fc9 commit 2ab0226
Show file tree
Hide file tree
Showing 11 changed files with 233 additions and 108 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added web/public/blog/forward-error-correction/mfw.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified web/public/blog/never-use-datagrams/denver.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed web/public/blog/never-use-datagrams/tubes.webp
Binary file not shown.
2 changes: 1 addition & 1 deletion web/src/pages/blog/distribution-at-twitch.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Distribution @ Twitch
author: kixelated
description: Eight years of progress at Twitch with various distribution protocols.
cover: "/blog/kixelCat.png"
date: 2021-10-13
date: 2022-02-15
---

# Source
Expand Down
161 changes: 161 additions & 0 deletions web/src/pages/blog/forward-error-correction.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
layout: "@/layouts/global.astro"
title: Forward? Error? Correction?
author: kixelated
description: Concealing packet loss is harder than you think.
cover: "/blog/forward-error-correction/mfw.jpeg"
date: 2024-02-17
---

# Forward? Error? Correction?
So I absolutely *dunked* on datagrams in the [last blog post](/blog/never-use-datagrams).
Now it's time to dunk on the last remaining hope for datagrams: [Forward Error Correction](https://www.techtarget.com/searchmobilecomputing/definition/forward-error-correction) (FEC)

## OPUS
[Opus](https://opus-codec.org/) is an amazing audio codec.
Full disclosure, I haven't had the opportunity to work with it directly, since I was stuck in [AAC](https://en.wikipedia.org/wiki/Advanced_Audio_Coding) land at Twitch, so I'm talking out of my ass a bit.

OPUS has built-in support for FEC which is neat.
There are so many possible FEC schemes, many of which are patented, and I would do the subject a disservice if I tried to explain them.
The idea is to send redundant data, so the receiver can paper over small amounts of packet loss.
It's conceptually similar to [RAID](https://en.wikipedia.org/wiki/RAID) but for packets spread over time instead of hard drives.

<figure>
![BBR](/blog/forward-error-correction/interleaving.png)
<figcaption>
[Source](https://en.wikipedia.org/wiki/Error_correction_code):
Instead you get this real image from Wikipedia.
</figcaption>
</figure>

Conveniently, audio "frames" are so small that they fit into a single datagram.
So rather than deal with retransmissions at the disgusting transport layer, the encoder can just encode redundancy into the bitstream.
RIP packet loss 1983-2024.

But the audio codec is so the wrong layer for this.

## Networks are Complicated
I worked with some very smart people at [Twitch](https://www.twitch.tv/).
However, I will never forget a presentation maybe 4 years ago where a very smart codec engineer pitched using FEC.

There was a graph that showed the TCP throughput during random packet loss.
Wow, TCP sure has a low birate at 30% packet loss.
But look at this demo, we used UDP+FEC and made something faster than TCP!

If somebody shows you any results based on simulated, random packet loss, you should politely tell them: **no, that's not how the internet works**.

<figure>
![Series of Tubes](/blog/forward-error-correction/tubes.png)
<figcaption>
**Fun fact**: the internet is not a series of tubes.
</figcaption>
</figure>

Networking is not quantum mechanics.
There are no dice involved and packet loss is *not random*.
It depends on the underlying transport.

Sometimes it occurs randomly due to signal interference. <br />
Sometimes it occurs in bursts due to batching. <br />
Sometimes it occurs due to congestion. <br />
Sometimes it occurs because ???.

Unfortunately, there's no magic loophole on the interenet.
You can't send 10x the data to mask packet loss.

In fact, if you ever see a number like 30% packet loss in the real world (yikes), it's likely due to congestion.
You're sending 30% *too much* and fully saturating a link.
The solution is to send *less* instead to drain network queues. 🤯

**Fun-fact**: That's the fundamental difference between loss-based congestion control (ex. Reno, CUBIC) and delay-based congestion control (ex. BBR, COPA).
BBRv1 doesn't even use packet loss as a signal; it's all about RTT.

## Expertise
These packet loss misconceptions come up surprisingly often in the live video space.
The hyper focus on packet loss is a symptom of a larger problem: media experts suddenly have to become networking experts.

Even modern media protocols are built directly on top of UDP, for example [WebRTC](https://webrtc.org/), [SRT](https://www.haivision.com/products/srt-secure-reliable-transport/), [Sye](https://nscreenmedia.com/amazon-buys-sye/), [RIST](https://www.rist.tv/).
It's for a good reason, as the head-of-line blocking of TCP is a non-starter for real-time media.
But with great power (UDP) comes great responsibility.

<figure>
![mfw](/blog/forward-error-correction/mfw.jpeg)
<figcaption>
[\> mfw](https://knowyourmeme.com/memes/im-going-to-die-spider-man-3-qte) a new protocol over UDP is announced.
</figcaption>
</figure>

And the same mistakes keep getting repeated.
I can't tell you the number of times I've talked to an engineer at a video conference who decries congestion control, and in the next breath claims FEC is the solution to all their problems.
WebRTC gets a passing grade because the Google engineers are super smart, although its complexity is a testiment to the difficulty of networking.

This is one of the reasons why we need **Media over QUIC**.
Let the network engineers handle the network and the media engineers handle the media.

## End-to-End
But my beef with FEC in OPUS is more fundamental.

When I speak into a microphone, the audio data is encoded into packet via a codec like OPUS.
That packet then traverses multiple multiple hops, potentially going over WiFi, Ethernet, 4G, fiber, satellites, etc.
It switches between different cell towers, routers, ISPs, transit providers, business units, and who knows what else.
Until finally, finally, the packet reaches ur Mom's iPhone and my words replay into her ear.
Tell her I miss her. 😢

Unfortuantely, each of those hops have different properties and packet loss scenarios.
Many of them already have FEC built-in or don't need it at all.

By performing FEC in the application layer, specifically the audio codec, we're making a decision that's **end-to-end**.
It's suboptimal by definition because packet loss is a **hop-by-hop** property.

## Hop-by-Hop
If not the audio codec, where should we perform FEC instead?

In my ideal world, each hop uses a tailored loss recovery mechanism.
This is based on the properties of the hop, and if they expect:
- **burst loss**: delayed parity.
- **random loss**: interleaved parity.
- **low RTT**: retransmit packets.
- **congestion**: drop packets.

But at which layer?
A protocol like WiFi is general purpose, so it uses a general purpose recovery mechanism.
It doesn't know that audio packets are time-sensitive.

There are ways to flag [QoS](https://en.wikipedia.org/wiki/Quality_of_service) in IP packets but ISP support is limited, as is the granularity.
That's why it why does make sense to perform additional FEC at a higher level, but again it should be hop-by-hop.

## QUIC
So I just dunked on FEC in OPUS.
"Don't do FEC in the audio codec, do it in QUIC instead."

Well QUIC doesn't support FEC yet.
There are [some proposals](https://www.ietf.org/archive/id/draft-michel-quic-fec-01.html) but I imagine it will be a long time before anything materializes.

QUIC is primarily designed and used by CDN companies.
Their whole purpose is put edge nodes as close to the user as possible in order to improve the user experience.
When your RTT to the Google/CloudFlare/Akamai/Fastly/etc edge is 20ms, then FEC is strictly worse than retransmissions.
FEC can only ever be an improvement when `target_latency < 2*RTT`.

Additionally, there might not even be a need for FEC in QUIC.
WebRTC supports [RED](https://webrtchacks.com/red-improving-audio-quality-with-redundancy/) which was [added to RTP in 1997](https://datatracker.ietf.org/doc/html/rfc2198).
Why send parity data when you can just transmit the same packet multiple times?
It's wasteful but it's simple.

RED actually works natively in QUIC without any extensions.
A QUIC library can send redundant [STREAM frames](https://www.rfc-editor.org/rfc/rfc9000.html#name-stream-frames) and the receiver will transparently discard duplicates.
This might just be good enough for now.

## Conclusion
Audio is important. <br />
Networks are quite complicated.<br />
This is not haiku.

FEC should not be in an audio codec.
It should be closer to the source of packet loss .
But at the end of the day, I'm just shoving blame down the stack.
Do what works best for your users at whatever layer you have access to.

Just please, never show me results based on random packet loss again.

Written by [@kixelated](https://github.com/kixelated).
<img src="/blog/kixelCat.png" class="inline w-16" />
10 changes: 8 additions & 2 deletions web/src/pages/blog/index.astro
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,22 @@ interface Frontmatter {
cover: string
description: string
author: string
date: string
}
const allPosts = await Astro.glob<Frontmatter>("./*.mdx")
const posts = await Astro.glob<Frontmatter>("./*.mdx")
posts.sort((a, b) => {
const dateA = Date.parse(a.frontmatter.date)
const dateB = Date.parse(b.frontmatter.date)
return dateB - dateA
})
---

<MainLayout title="Blog">
<section>
<h1>Blog Posts</h1>
{
allPosts.map((post) => (
posts.map((post) => (
<article class="mb-6 rounded-lg grid grid-cols-2 hover:bg-blue-950 hover:scale-110 hover:translate-x-2 transition-all ease-in-out">
<a href={post.url} class="rounded-2xl">
<img class="object-cover h-48 w-96 rounded-2xl" src={post.frontmatter.cover} alt="blog cover" />
Expand Down
Loading

0 comments on commit 2ab0226

Please sign in to comment.