diff --git a/web/public/blog/replacing-hls-dash/troll.webp b/web/public/blog/replacing-hls-dash/troll.webp new file mode 100644 index 0000000..d400246 Binary files /dev/null and b/web/public/blog/replacing-hls-dash/troll.webp differ diff --git a/web/src/pages/blog/replacing-hls-dash.mdx b/web/src/pages/blog/replacing-hls-dash.mdx index f87e9e7..acff367 100644 --- a/web/src/pages/blog/replacing-hls-dash.mdx +++ b/web/src/pages/blog/replacing-hls-dash.mdx @@ -41,11 +41,11 @@ The next sentence gives a hint as to why: > If your app uses HTTP Live Streaming over cellular networks, you are required to provide at least one stream at 64 Kbps or lower bandwidth. This was back in 2009 when the iPhone 3GS was released and AT&T's network was [struggling to meet the demand](https://www.wired.com/2010/07/ff-att-fail/). -The key feature of HLS is [ABR](https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming): encoding multiple copies of the same content at different bit-rates. -This allowed the Apple-controlled HLS player to reduce the bitrate rather than pummel a poor mega-corp's cellular network. +The key feature of HLS was [ABR](https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming): multiple copies of the same content at different bitrates. +This allowed the Apple-controlled HLS player to reduce the bitrate rather than pummel a poor megacorp's cellular network. -[DASH](https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP) came afterwards in an attempt to standardize HLS... but without the controlled by Apple part. -There's definitely some cool features in DASH but the [core concepts are the same](https://www.cloudflare.com/learning/video/what-is-mpeg-dash/) and now they even share the same [media container](https://www.wowza.com/blog/what-is-cmaf). +[DASH](https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP) came afterwards in an attempt to standardize HLS minus the controlled by Apple part. +There's definitely some cool features in DASH but the [core concepts are the same](https://www.cloudflare.com/learning/video/what-is-mpeg-dash/) and they even share the same [media container](https://www.wowza.com/blog/what-is-cmaf) now. So the two get bundled together as **HLS/DASH**. But I'll focus more on HLS since that's my shit. @@ -64,12 +64,12 @@ New segments are constantly being generated and announced to the player via a "p
Thanks for the filer image, DALLĀ·E
-By using HTTP, a service like Twitch can piggyback on the existing infrastructure of the internet. +Because HLS uses HTTP, a service like Twitch can piggyback on the existing infrastructure of the internet. There's a plethora of optimized CDNs, servers, and clients that all speak HTTP and can be used to transport media. You do have to do some extra work to massage live video into HTTP semantics, but it's worth it. -Crafting individual IP packets might the _correct_ way to send live media (ie. WebRTC), but it's not the most cost effective. -The key is utilizing [economies of scale](https://napkinfinance.com/napkin/what-are-economies-of-scale/) to make it cheap to deliver media when latency is not critical. +The key is utilizing [economies of scale](https://napkinfinance.com/napkin/what-are-economies-of-scale/) to make it cheap to mass distribute live media. +Crafting individual IP packets might the _correct_ way to send live media with minimal latency (ie. WebRTC), but it's not the most cost effective. ## The Bad Stuff @@ -98,10 +98,10 @@ Frames will take longer and longer to reach the player until the buffer is deple A HLS/DASH player can detect queuing and switch to a lower bitrate via ABR. However, it can only do this at infrequent (ex. 2s) segment boundaries, and it can't renege any frames already flushed to the socket. -So if you're watching 1080p video and your network takes a dump, well you still need to download seconds of unsustainable 1080p video, before you can switch down to a reasonable 360p. +So if you're watching 1080p video and your network takes a dump, well you still need to download seconds of unsustainable 1080p video before you can switch down to a reasonable 360p. You can't just put the toothpaste back in the tube if you squeeze out too much. -You gotta commit to that toothpaste, even if it takes longer to brush your teeth. +You gotta use all of the toothpaste, even if it takes much longer to brush your teeth.
![TCP toothpaste](/blog/replacing-webrtc/toothpaste.jpg) @@ -124,20 +124,19 @@ The problem really depends on your perspective. If you control: - **server only**: Life is _pain_. For a service like Twitch, the solution might seem simple: build your own client and server! -And we did, including a bare-metal live CDN designed exclusively for HLS. +And we did, including a baremetal live CDN designed exclusively for HLS. -But [until quite recently](https://bitmovin.com/managed-media-source), we've been forced to use the Apple HLS player on iOS for AirPlay or Safari support. +But [until quite recently](https://bitmovin.com/managed-media-source), we have been forced to use the Apple HLS player on iOS for AirPlay or Safari support. And of course TVs, consoles, casting devices, and others have their own HLS players. -And if you're offering your bare-metal live CDN [to the public](https://aws.amazon.com/ivs/), you can't exactly force customers to use your proprietary player. +And if you're offering your baremetal live CDN [to the public](https://aws.amazon.com/ivs/), you can't exactly force customers to use your proprietary player. So you're stuck with a _dumb_ server and a bunch of _dumb_ clients. These _dumb_ clients make _dumb_ decisions with no cooperation with the server, based on imperfect information. -It's not a huge surprise that different platforms provide wildly different experiences. -### Apple +### Ownership I love the simplicity of HLS compared to DASH. -There's something so satisfying about a text-based playlist that you can actually read, versus a XML monstrosity designed by committee (_gasp_). +There's something so satisfying about a text-based playlist that you can actually read, versus a XML monstrosity designed by committee. ```m3u8 #EXTM3U @@ -153,23 +152,28 @@ http://media.example.com/third.ts ```
-
- [Source](https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis/#section-9.1). Orgasmic. -
+
[Orgasmic](https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis/#section-9.1).
But unfortunately Apple controls HLS. There's a misalignment of incentives between Apple and the rest of the industry. I'm not even sure how Apple uses HLS, or why they would care about latency, or why they insist on being the sole arbiter of a live streaming protocol. -[Pantos](https://www.crunchbase.com/person/roger-pantos) has done a great and thankless job, but it feels like a stand-off at gunpoint. +[Pantos](https://www.crunchbase.com/person/roger-pantos) has done a great and thankless job, but it feels like a stand-off. For example, LL-HLS originally [required HTTP/2 server push](https://www.theoplayer.com/blog/impact-of-apple-ll-hls-update-2020) and it took nearly the entire industry to convince Apple that this was a bad idea. The upside is that we got [a mailing list](https://lists.apple.com/mailman/listinfo/hls-announce) so they can announce changes to developers first... but don't expect the ability to propose changes any time soon. -DASH is more open protocol but it's controlled by [MPEG](https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group), which is another can of worms. -Why put your specification [behind a paywall](https://www.iso.org/standard/79106.html)? -But DASH doesn't matter while HLS is still required on iOS. +DASH is it's own can of worms as it's controlled by [MPEG](https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group). +The specifications are [behind a paywall](https://www.iso.org/standard/79106.html) or [require patent licensing](https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=133508)? +I can't even tell if I'm going to [get sued](https://www.mpegla.com/wp-content/uploads/DASHWeb.pdf) for parsing a DASH playlist without paying the troll toll. + +
+ ![troll toll](/blog/replacing-hls-dash/troll.webp) +
+ [Source](https://itsalwayssunny.fandom.com/wiki/The_Nightman_Cometh). You gotta pay the Troll Toll +
+
# What's next? @@ -193,7 +197,6 @@ Yes, you can do real-time latency with TCP (or WebSockets) under ideal condition However, it just won't work well enough on poor networks. Congestion and buffer-bloat will absolutely wreck your protocol on poor networks. A lot of my time spent at Twitch was optimizing for the 90th percentile; the shoddy cellular networks in Brazil or India or Australia. -If all of your customers have a flawless internet connection, or live inside your data center, then TCP is absolutely the great choice. But if you are going to reinvent RTMP, there are [some ways to reduce queuing](https://www.youtube.com/watch?v=cpYhm74zp0U) but they are quite limited. This is _especially_ true in a browser environment when limited to HTTP or [WebSockets](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API). @@ -205,28 +208,29 @@ See my next blog post about **Replacing RTMP**. Notably absent thus far has been any mention of [LL-HLS](https://www.theoplayer.com/blog/low-latency-hls-lhls) and [LL-DASH](https://www.wowza.com/blog/what-is-low-latency-dash). These two protocols are meant to lower HLS/DASH latency respectively by breaking media segments into smaller chunks. -The the chunks might be smaller, but they're still served sequentially over TCP. -The latency floor is lower, but the latency ceiling is still just as high, and you're still going to buffer during congestion. +The chunks might be smaller, but they're still served sequentially over TCP. +The latency floor is lower but the latency ceiling is still just as high, and you're still going to buffer during congestion.
![buffering](/blog/replacing-hls-dash/buffering.gif)
\> tfw LL-HLS/LL-DASH
-But we're also approaching the limit of what you can do with HTTP semantics. +We're also approaching the limit of what you can do with HTTP semantics. -- **LL-HLS** can be configured to make 20 sequential HTTP requests per second _per track_, all of which are in the latency critical path. And it still manages to add +100ms of latency making it untenable for real-time. -- **LL-DASH** can deliver frame-by-frame with chunked-transfer, but it adds overhead and absolutely wrecks client-side ABR algorithms. [Twitch hosted a challenge](https://blog.twitch.tv/en/2020/01/15/twitch-invites-you-to-take-on-our-acm-mmsys-2020-grand-challenge/) but I'm convinced it's impossible without server feedback. +- **LL-HLS** has configurable latency at the cost of and exponential number of sequential requests in the critical path. For example, 20 HTTP requests a second _per track_ still only gets you +100ms of latency, which is not even viable for real-time latency. +- **LL-DASH** can be configured down to +0ms added latency, delivering frame-by-frame with chunked-transfer. However it absolutely wrecks client-side ABR algorithms. Twitch [hosted a challenge](https://blog.twitch.tv/en/2020/01/15/twitch-invites-you-to-take-on-our-acm-mmsys-2020-grand-challenge/) to improve this but I'm convinced it's impossible without server feedback. -I do want to give a special shout-out to [HESP](https://www.theoplayer.com/solutions/hesp-high-efficiency-streaming). -It works by canceling HTTP requests during congestion and frankensteining the video encoding which is quite clever, but suffers the same fate. +[HESP](https://www.theoplayer.com/solutions/hesp-high-efficiency-streaming) also gets a special shout-out because it's cool. +It works by canceling HTTP requests during congestion and frankensteining the video encoding which is quite ~~hacky~~ clever, but suffers a similar fate. We've hit a wall with HTTP over TCP. ## HTTP/3 -If you're an astute networking aficionado, you might have realized that [HTTP/3](https://www.cloudflare.com/learning/performance/what-is-http3) uses [QUIC](https://www.rfc-editor.org/rfc/rfc9000.html) instead of TCP. -We can replace any mention of ~~TCP~~ with QUIC, problem solved! +If you're an astute hyper text transport protocol aficionado, you might have noticed that I said "HTTP over TCP" above. +But [HTTP/3](https://www.cloudflare.com/learning/performance/what-is-http3) uses [QUIC](https://www.rfc-editor.org/rfc/rfc9000.html) instead of TCP. +Problem solved! We can replace any mention of ~~TCP~~ with QUIC! Well, not quite. To use another complicated topic as a metaphor: @@ -235,20 +239,17 @@ Well, not quite. To use another complicated topic as a metaphor: If you take a single threaded program and run it on a multi-core machine, it will run just as slow, and perhaps even slower. This is the case with HLS/DASH as each segment request is made _sequentially_. +HTTP/3 is not a magic bullet and only has marginal benefits when used with HLS/DASH. -Why not utilize multiple TCP connections? -Well it's a great idea and that's how browsers work with HTTP/1.1. -However, each one involves an expensive TCP/TLS handshake and they all fight for limited bandwidth. - -

The key to using QUIC is to embrace concurrency

+

The key to using QUIC is to embrace concurrency.

This means utilizing multiple, independent streams that share a connection. You can prioritize a stream so it gets more bandwidth during congestion, much like you can use `nice` on Linux to prioritize a process when CPU starved. If a stream is taking too long, you can cancel it much like you can `kill` a process. -For live media, you want to prioritize new media over old media, since it's okay to skip old content. +For live media, you want to prioritize new media over old media in order to skip old content. You also want to prioritize audio over video, so you can hear what someone is saying without necessarily seeing their lips move. -If you can only transmit 50% of media stream, make sure it's the most important 50%. +If you can only transmit half of a media stream in time, make sure it's the most important half. To Apple/Pantos' credit, LL-HLS is exploring [prioritization using HTTP/3](https://mailarchive.ietf.org/arch/msg/hls-interest/RcZ2SG8Sz_zZEcjWnDKzcM_-TJk/). It doesn't go far enough (yet!) and HTTP semantics get in the way, but it's absolutely the right direction. @@ -258,12 +259,12 @@ But of course I'm biased towards... # Media over QUIC -Media over QUIC utilizes WebTransport/QUIC directly to avoid TCP and HTTP. +MoQ utilizes WebTransport/QUIC directly to avoid TCP and HTTP. But what about that whole **economies of scale** stuff? -Well, there are some important differences between Media over QUIC and your standard _not invented here_ protocol: +Well, there are some important differences between Media over QUIC as compared to your standard _not invented here_ protocol: -## Reason 0: Utilize QUIC +## Reason 0: QUIC QUIC is the future of the internet. TCP is a relic of the past. @@ -275,20 +276,20 @@ TCP is a relic of the past. It's a **bold** claim I know. But I struggle to think of a single reason why you would use TCP over QUIC going forward. -There are still some corporate firewalls that block UDP (used by QUIC) but I mean that's about it. +There are still some corporate firewalls that block UDP (used by QUIC) and hardware offload doesn't exist yet, but I mean that's about it. -It will take a while, but every library, server, load balancer, and NIC will be optimized for QUIC delivery. -By utilizing as much of QUIC as possible, we can shove as much complexity into these optimized layers. -We benefit also from any proposed new features, such as [multi-path](https://datatracker.ietf.org/doc/draft-ietf-quic-multipath/), [FEC](https://datatracker.ietf.org/doc/draft-michel-quic-fec/), [congestion control](https://datatracker.ietf.org/doc/rfc9330/), etc. -I don't want any of these network features implemented in my media layer thank you very much (looking at you WebRTC). +It will take a few years, but every library, server, load balancer, and NIC will be optimized for QUIC delivery. +Media over QUIC offloads as much as possible into this powerful layer. +We benefit also from any new features, including proposals such as [multi-path](https://datatracker.ietf.org/doc/draft-ietf-quic-multipath/), [FEC](https://datatracker.ietf.org/doc/draft-michel-quic-fec/), [congestion control](https://datatracker.ietf.org/doc/rfc9330/), etc. +I don't want network features in my media layer _thank you very much_ (looking at you WebRTC). It might not be obvious is that HTTP/3 is actually a thin layer on top of QUIC. Likewise MoQ is also meant to be a thin layer on top of QUIC, effectively just providing pub/sub semantics. -We get all of the benefits of QUIC without the baggage of HTTP, and yet still have web support via [WebTransport](https://developer.mozilla.org/en-US/docs/Web/API/WebTransport_API). +We get all of the benefits of QUIC without the baggage of HTTP, and yet still achieve web support via [WebTransport](https://developer.mozilla.org/en-US/docs/Web/API/WebTransport_API). Instead we can focus on the important stuff instead: **live media**. -## Reason 1: Designed for Relays +## Reason 1: Relay Layer To avoid [the mistakes of WebRTC](/blog/replacing-webrtc), we need to decouple the application from the transport. If a relay (ie. CDN) knows anything about media encoding, we have failed. @@ -300,7 +301,7 @@ The application splits data into "objects", annotated with a header providing si These are generic signals, including stuff like the priority, reliability, grouping, expiration, etc. MoqTransport is designed to be used for arbitrary applications. -For example: +Some examples include: - live chat - end-to-end encryption @@ -310,17 +311,17 @@ For example: This is huge draw for CDN vendors. Instead of building a custom WebRTC CDN that targets one specific niche, you can cast a much wider net with MoqTransport. -Individuals from Akamai, Google, and Cloudflare have been involved in the standardization process thus far. +Akamai, Google, and Cloudflare have been involved in the standardization process thus far and CDN support is inevitable. ## Reason 2: Media Layer There will be at least one media layer on top of MoqTransport. -We're focused on the transport right now, so there's no official "adopted" draft yet. +We're focused on the transport right now so there's no official "adopted" draft yet. However, my proposal is [Warp](https://datatracker.ietf.org/doc/draft-law-moq-warpstreamingformat/). -It uses CMAF so it's backwards compatible with HLS/DASH while still being able to provide real-time latency. +It uses CMAF so it's backwards compatible with HLS/DASH while still capable of real-time latency. I think this is critically important, as any migration has to be done piecewise, client-by-client and user-by-user. -The same media segments can be served for a dual Warp and HLS/DASH roll-out, while also being saved to disk for VoD. +The same media segments can be served for a mixed roll-out and for VoD. This website uses Warp! [Try it out!](/watch) Or watch one of my [presentations](https://www.youtube.com/watch?v=PncdrMPVaNc). @@ -342,7 +343,9 @@ But I think [this part](https://www.ietf.org/about/introduction/) is especially > There is no membership in the IETF. Anyone can participate by signing up to a working group mailing list (more on that below), or registering for an IETF meeting. All IETF participants are considered volunteers and expected to participate as individuals, including those paid to participate. -It's not a protocol controlled by a single company. +It's not a protocol owned by a company. +It's not a protocol owned by lawyers. + [Join the mailing list](https://www.ietf.org/mailman/listinfo/moq). # What's missing? @@ -351,10 +354,10 @@ Okay cool so hopefully I sold you on MoQ. What can't you use it today to replace HLS/DASH? 1. **It's not done yet**: The IETF is many things, but fast is not one of them. -2. **Cost**: QUIC is a new protocol that has yet to be fully optimized to the extent of TCP. -3. **Support**: Your favorite language/library/cdn/cloud might not even provide HTTP/3 support yet, let alone WebTransport or QUIC. +2. **Cost**: QUIC is a new protocol that has yet to be fully optimized to match TCP. It's possible and apparently Google is [near parity](https://conferences.sigcomm.org/sigcomm/2020/files/slides/epiq/0%20QUIC%20and%20HTTP_3%20CPU%20Performance.pdf). +3. **Support**: Your favorite language/library/cdn/cloud/browser might not even provide HTTP/3 support yet, let alone WebTransport or QUIC. 4. **Features**: Somebody has to reimplement all of the annoying HLS/DASH features like DRM and server-side advertisements.... -5. **VoD**: HLS/DASH work great, why replace it? MoQ is currently live only. +5. **VoD**: MoQ is currently live only. HLS/DASH work great, why replace it? We'll get there eventually.