FEC blog (#89)

kixelated · Feb 18, 2024 · 2ab0226 · 2ab0226
1 parent 7745fc9
commit 2ab0226
Show file tree

Hide file tree

Showing 11 changed files with 233 additions and 108 deletions.
diff --git a/web/public/blog/forward-error-correction/interleaving.png b/web/public/blog/forward-error-correction/interleaving.png
diff --git a/web/public/blog/forward-error-correction/mfw.jpeg b/web/public/blog/forward-error-correction/mfw.jpeg
diff --git a/...public/blog/never-use-datagrams/tubes.png → ...c/blog/forward-error-correction/tubes.png b/...public/blog/never-use-datagrams/tubes.png → ...c/blog/forward-error-correction/tubes.png
diff --git a/web/public/blog/never-use-datagrams/denver.jpeg b/web/public/blog/never-use-datagrams/denver.jpeg
diff --git a/web/public/blog/never-use-datagrams/tubes.webp b/web/public/blog/never-use-datagrams/tubes.webp
diff --git a/web/src/pages/blog/distribution-at-twitch.mdx b/web/src/pages/blog/distribution-at-twitch.mdx
@@ -4,7 +4,7 @@ title: Distribution @ Twitch
 author: kixelated
 description: Eight years of progress at Twitch with various distribution protocols.
 cover: "/blog/kixelCat.png"
-date: 2021-10-13
+date: 2022-02-15
 ---
 
 # Source

diff --git a/web/src/pages/blog/forward-error-correction.mdx b/web/src/pages/blog/forward-error-correction.mdx
@@ -0,0 +1,161 @@
+---
+layout: "@/layouts/global.astro"
+title: Forward? Error? Correction?
+author: kixelated
+description: Concealing packet loss is harder than you think.
+cover: "/blog/forward-error-correction/mfw.jpeg"
+date: 2024-02-17
+---
+
+# Forward? Error? Correction?
+So I absolutely *dunked* on datagrams in the [last blog post](/blog/never-use-datagrams).
+Now it's time to dunk on the last remaining hope for datagrams: [Forward Error Correction](https://www.techtarget.com/searchmobilecomputing/definition/forward-error-correction) (FEC)
+
+## OPUS
+[Opus](https://opus-codec.org/) is an amazing audio codec.
+Full disclosure, I haven't had the opportunity to work with it directly, since I was stuck in [AAC](https://en.wikipedia.org/wiki/Advanced_Audio_Coding) land at Twitch, so I'm talking out of my ass a bit.
+
+OPUS has built-in support for FEC which is neat.
+There are so many possible FEC schemes, many of which are patented, and I would do the subject a disservice if I tried to explain them.
+The idea is to send redundant data, so the receiver can paper over small amounts of packet loss.
+It's conceptually similar to [RAID](https://en.wikipedia.org/wiki/RAID) but for packets spread over time instead of hard drives.
+
+<figure>
+	![BBR](/blog/forward-error-correction/interleaving.png)
+	<figcaption>
+		[Source](https://en.wikipedia.org/wiki/Error_correction_code):
+		Instead you get this real image from Wikipedia.
+	</figcaption>
+</figure>
+
+Conveniently, audio "frames" are so small that they fit into a single datagram.
+So rather than deal with retransmissions at the disgusting transport layer, the encoder can just encode redundancy into the bitstream.
+RIP packet loss 1983-2024.
+
+But the audio codec is so the wrong layer for this.
+
+## Networks are Complicated
+I worked with some very smart people at [Twitch](https://www.twitch.tv/).
+However, I will never forget a presentation maybe 4 years ago where a very smart codec engineer pitched using FEC.
+
+There was a graph that showed the TCP throughput during random packet loss.
+Wow, TCP sure has a low birate at 30% packet loss.
+But look at this demo, we used UDP+FEC and made something faster than TCP!
+
+If somebody shows you any results based on simulated, random packet loss, you should politely tell them: **no, that's not how the internet works**.
+
+<figure>
+	![Series of Tubes](/blog/forward-error-correction/tubes.png)
+	<figcaption>
+		**Fun fact**: the internet is not a series of tubes.
+	</figcaption>
+</figure>
+
+Networking is not quantum mechanics.
+There are no dice involved and packet loss is *not random*.
+It depends on the underlying transport.
+
+Sometimes it occurs randomly due to signal interference. <br />
+Sometimes it occurs in bursts due to batching. <br />
+Sometimes it occurs due to congestion. <br />
+Sometimes it occurs because ???.
+
+Unfortunately, there's no magic loophole on the interenet.
+You can't send 10x the data to mask packet loss.
+
+In fact, if you ever see a number like 30% packet loss in the real world (yikes), it's likely due to congestion.
+You're sending 30% *too much* and fully saturating a link.
+The solution is to send *less* instead to drain network queues. 🤯
+
+**Fun-fact**: That's the fundamental difference between loss-based congestion control (ex. Reno, CUBIC) and delay-based congestion control (ex. BBR, COPA).
+BBRv1 doesn't even use packet loss as a signal; it's all about RTT.
+
+## Expertise
+These packet loss misconceptions come up surprisingly often in the live video space.
+The hyper focus on packet loss is a symptom of a larger problem: media experts suddenly have to become networking experts.
+
+Even modern media protocols are built directly on top of UDP, for example [WebRTC](https://webrtc.org/), [SRT](https://www.haivision.com/products/srt-secure-reliable-transport/), [Sye](https://nscreenmedia.com/amazon-buys-sye/), [RIST](https://www.rist.tv/).
+It's for a good reason, as the head-of-line blocking of TCP is a non-starter for real-time media.
+But with great power (UDP) comes great responsibility.
+
+<figure>
+	![mfw](/blog/forward-error-correction/mfw.jpeg)
+	<figcaption>
+		[\> mfw](https://knowyourmeme.com/memes/im-going-to-die-spider-man-3-qte) a new protocol over UDP is announced.
+	</figcaption>
+</figure>
+
+And the same mistakes keep getting repeated.
+I can't tell you the number of times I've talked to an engineer at a video conference who decries congestion control, and in the next breath claims FEC is the solution to all their problems.
+WebRTC gets a passing grade because the Google engineers are super smart, although its complexity is a testiment to the difficulty of networking.
+
+This is one of the reasons why we need **Media over QUIC**.
+Let the network engineers handle the network and the media engineers handle the media.
+
+## End-to-End
+But my beef with FEC in OPUS is more fundamental.
+
+When I speak into a microphone, the audio data is encoded into packet via a codec like OPUS.
+That packet then traverses multiple multiple hops, potentially going over WiFi, Ethernet, 4G, fiber, satellites, etc.
+It switches between different cell towers, routers, ISPs, transit providers, business units, and who knows what else.
+Until finally, finally, the packet reaches ur Mom's iPhone and my words replay into her ear.
+Tell her I miss her. 😢
+
+Unfortuantely, each of those hops have different properties and packet loss scenarios.
+Many of them already have FEC built-in or don't need it at all.
+
+By performing FEC in the application layer, specifically the audio codec, we're making a decision that's **end-to-end**.
+It's suboptimal by definition because packet loss is a **hop-by-hop** property.
+
+## Hop-by-Hop
+If not the audio codec, where should we perform FEC instead?
+
+In my ideal world, each hop uses a tailored loss recovery mechanism.
+This is based on the properties of the hop, and if they expect:
+- **burst loss**: delayed parity.
+- **random loss**: interleaved parity.
+- **low RTT**: retransmit packets.
+- **congestion**: drop packets.
+
+But at which layer?
+A protocol like WiFi is general purpose, so it uses a general purpose recovery mechanism.
+It doesn't know that audio packets are time-sensitive.
+
+There are ways to flag [QoS](https://en.wikipedia.org/wiki/Quality_of_service) in IP packets but ISP support is limited, as is the granularity.
+That's why it why does make sense to perform additional FEC at a higher level, but again it should be hop-by-hop.
+
+## QUIC
+So I just dunked on FEC in OPUS.
+"Don't do FEC in the audio codec, do it in QUIC instead."
+
+Well QUIC doesn't support FEC yet.
+There are [some proposals](https://www.ietf.org/archive/id/draft-michel-quic-fec-01.html) but I imagine it will be a long time before anything materializes.
+
+QUIC is primarily designed and used by CDN companies.
+Their whole purpose is put edge nodes as close to the user as possible in order to improve the user experience.
+When your RTT to the Google/CloudFlare/Akamai/Fastly/etc edge is 20ms, then FEC is strictly worse than retransmissions.
+FEC can only ever be an improvement when `target_latency < 2*RTT`.
+
+Additionally, there might not even be a need for FEC in QUIC.
+WebRTC supports [RED](https://webrtchacks.com/red-improving-audio-quality-with-redundancy/) which was [added to RTP in 1997](https://datatracker.ietf.org/doc/html/rfc2198).
+Why send parity data when you can just transmit the same packet multiple times?
+It's wasteful but it's simple.
+
+RED actually works natively in QUIC without any extensions.
+A QUIC library can send redundant [STREAM frames](https://www.rfc-editor.org/rfc/rfc9000.html#name-stream-frames) and the receiver will transparently discard duplicates.
+This might just be good enough for now.
+
+## Conclusion
+Audio is important. <br />
+Networks are quite complicated.<br />
+This is not haiku.
+
+FEC should not be in an audio codec.
+It should be closer to the source of packet loss .
+But at the end of the day, I'm just shoving blame down the stack.
+Do what works best for your users at whatever layer you have access to.
+
+Just please, never show me results based on random packet loss again.
+
+Written by [@kixelated](https://github.com/kixelated).
+<img src="/blog/kixelCat.png" class="inline w-16" />
diff --git a/web/src/pages/blog/index.astro b/web/src/pages/blog/index.astro
@@ -8,16 +8,22 @@ interface Frontmatter {
 	cover: string
 	description: string
 	author: string
+	date: string
 }
 
-const allPosts = await Astro.glob<Frontmatter>("./*.mdx")
+const posts = await Astro.glob<Frontmatter>("./*.mdx")
+posts.sort((a, b) => {
+	const dateA = Date.parse(a.frontmatter.date)
+	const dateB = Date.parse(b.frontmatter.date)
+	return dateB - dateA
+})
 ---
 
 <MainLayout title="Blog">
 	<section>
 		<h1>Blog Posts</h1>
 		{
-			allPosts.map((post) => (
+			posts.map((post) => (
 				<article class="mb-6 rounded-lg grid grid-cols-2 hover:bg-blue-950 hover:scale-110 hover:translate-x-2 transition-all ease-in-out">
 					<a href={post.url} class="rounded-2xl">
 						<img class="object-cover h-48 w-96 rounded-2xl" src={post.frontmatter.cover} alt="blog cover" />