Skip to content

Commit

Permalink
More improvements.
Browse files Browse the repository at this point in the history
  • Loading branch information
kixelated committed Nov 14, 2023
1 parent a40d353 commit 9eb7033
Show file tree
Hide file tree
Showing 2 changed files with 102 additions and 56 deletions.
Binary file added web/public/blog/replacing-hls-dash/green.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
158 changes: 102 additions & 56 deletions web/src/pages/blog/replacing-hls-dash.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ author: kixelated

# Replacing HLS/DASH

Low-latency, high bitrate, mass fanout is hard. Who knew?
Low-latency, high bitrate, mass fan-out is hard. Who knew?

See [Replacing WebRTC](https://quic.video/blog/replacing-webrtc) for the previous post in this series.

Expand All @@ -17,7 +17,7 @@ If you're using HLS/DASH and your main priority is...
- **cost**: wait until there CDN offerings.
- **latency**: you should seriously consider MoQ.
- **features**: it will take a while to implement everything.
- **vod**: you're good!
- **vod**: it works great, why replace it?

## Intro

Expand All @@ -41,8 +41,8 @@ The next sentence gives a hint as to why:
> If your app uses HTTP Live Streaming over cellular networks, you are required to provide at least one stream at 64 Kbps or lower bandwidth.
This was back in 2009 when the iPhone 3GS was released and AT&T's network was [struggling to meet the demand](https://www.wired.com/2010/07/ff-att-fail/).
The key feature of HLS is [ABR](https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming): encoding multiple copies of the same content at different bitrates.
This allowed the Apple-controlled HLS player to reduce the bitrate rather than pummel a poor megacorp's cellular network.
The key feature of HLS is [ABR](https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming): encoding multiple copies of the same content at different bit-rates.
This allowed the Apple-controlled HLS player to reduce the bitrate rather than pummel a poor mega-corp's cellular network.

[DASH](https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP) came afterwards in an attempt to standardize HLS... but without the controlled by Apple part.
There's definitely some cool features in DASH but the [core concepts are the same](https://www.cloudflare.com/learning/video/what-is-mpeg-dash/) and now they even share the same [media container](https://www.wowza.com/blog/what-is-cmaf).
Expand All @@ -52,7 +52,7 @@ But I'll focus more on HLS since that's my shit.

## The Good Stuff

While we were forced to switch protocols at the tech equivalent of gunpoint, HLS actually has some amazing benfits.
While we were forced to switch protocols at the tech equivalent of gunpoint, HLS actually has some amazing benefits.
The biggest one is that it uses **HTTP**.

HLS/DASH works by breaking media into "segments", each containing a few seconds of media.
Expand All @@ -66,9 +66,9 @@ New segments are constantly being generated and announced to the player via a "p

By using HTTP, a service like Twitch can piggyback on the existing infrastructure of the internet.
There's a plethora of optimized CDNs, servers, and clients that all speak HTTP and can be used to transport media.
You do have to do some extra work to mold live video into HTTP semantics, but it's worth it.
You do have to do some extra work to massage live video into HTTP semantics, but it's worth it.

Crafting individual IP packets might be the most _correct_ way to send live media (ie. WebRTC), but it's not the most cost effective.
Crafting individual IP packets might the _correct_ way to send live media (ie. WebRTC), but it's not the most cost effective.
The key is utilizing [economies of scale](https://napkinfinance.com/napkin/what-are-economies-of-scale/) to make it cheap to deliver media when latency is not critical.

## The Bad Stuff
Expand All @@ -93,15 +93,15 @@ Frames will take longer and longer to reach the player until the buffer is deple

<figure>
![buffering](/blog/replacing-hls-dash/buffering.gif)
<figcaption>> tfw HLS/DASH</figcaption>
<figcaption>\> tfw HLS/DASH</figcaption>
</figure>

A HLS/DASH player can detect queuing and switch to a lower bitrate via ABR.
However, it can only do this at infrequent (ex. 2s) segment boundaries, and it can't renege any frames already flushed to the socket.
So if you're watching 1080p video and your network takes a dump, well you still need to download seconds of 1080p video, before you can switch down to 360p.
So if you're watching 1080p video and your network takes a dump, well you still need to download seconds of unsustainable 1080p video, before you can switch down to a reasonable 360p.

You can't just put the toothpaste back in the tube if you squeeze out too much.
You gotta COMMIT to that toothpaste, even if it takes longer to brush your teeth.
You gotta commit to that toothpaste, even if it takes longer to brush your teeth.

<figure>
![TCP toothpaste](/blog/replacing-webrtc/toothpaste.jpg)
Expand All @@ -124,34 +124,81 @@ The problem really depends on your perspective. If you control:
- **server only**: Life is _pain_.

For a service like Twitch, the solution might seem simple: build your own client and server!
And we did, including a baremetal live CDN designed exclusively for HLS.
And we did, including a bare-metal live CDN designed exclusively for HLS.

But [until quite recently](https://bitmovin.com/managed-media-source) you've been forced to use the Apple HLS player on iOS for AirPlay or Safari support.
But [until quite recently](https://bitmovin.com/managed-media-source), we've been forced to use the Apple HLS player on iOS for AirPlay or Safari support.
And of course TVs, consoles, casting devices, and others have their own HLS players.
And if you're offering your baremtal live CDN [to the public](https://aws.amazon.com/ivs/), you can't exactly force customers to use your proprietary player.
And if you're offering your bare-metal live CDN [to the public](https://aws.amazon.com/ivs/), you can't exactly force customers to use your proprietary player.

So you're stuck with a _dumb_ server and a bunch of _dumb_ clients.
These _dumb_ clients make _dumb_ decisions with no cooperation with the server, based on imperfect information.
It's not a huge surprise that different platforms provide wildly different experiences.

### Apple

I love the simplicity of HLS compared to DASH.
There's something so satisfying about a text-based playlist that you can actually read, versus a XML monstrosity designed by committee (_gasp_).

```m3u8
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-VERSION:3
#EXTINF:9.009,
http://media.example.com/first.ts
#EXTINF:9.009,
http://media.example.com/second.ts
#EXTINF:3.003,
http://media.example.com/third.ts
#EXT-X-ENDLIST
```

<figure>
<figcaption>
[Source](https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis/#section-9.1). Orgasmic.
</figcaption>
</figure>

But unfortunately Apple controls HLS.

There's a misalignement of incentives between Apple and the rest of the industry.
There's a misalignment of incentives between Apple and the rest of the industry.
I'm not even sure how Apple uses HLS, or why they would care about latency, or why they insist on being the sole arbiter of a live streaming protocol.
Pantos has done a great and thankless job, but improvements are often not good enough and it feels like a stand-off.
[Pantos](https://www.crunchbase.com/person/roger-pantos) has done a great and thankless job, but it feels like a stand-off at gunpoint.

For example, [LL-HLS originally required HTTP/2 server push](https://www.theoplayer.com/blog/impact-of-apple-ll-hls-update-2020) and it took nearly the entire industry to convince Apple that this was a bad idea.
The upside is that we got [a mailing list](https://lists.apple.com/mailman/listinfo/hls-announce) so they can announce changes, but don't expect to propose changes like a standards body.
For example, LL-HLS originally [required HTTP/2 server push](https://www.theoplayer.com/blog/impact-of-apple-ll-hls-update-2020) and it took nearly the entire industry to convince Apple that this was a bad idea.
The upside is that we got [a mailing list](https://lists.apple.com/mailman/listinfo/hls-announce) so they can announce changes to developers first... but don't expect the ability to propose changes any time soon.

DASH is more open protocol but it's controlled by [MPEG](https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group), which is a whole other can of worms.
It doesn't matter though until HLS is no longer required on iOS.
DASH is more open protocol but it's controlled by [MPEG](https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group), which is another can of worms.
Why put your specification [behind a paywall](https://www.iso.org/standard/79106.html)?
But DASH doesn't matter while HLS is still required on iOS.

# What's next?

Given a blank slate and a green field, what do you do?
You're given a blank canvas and a brush to paint the greenest of fields, what do you make?

<figure>
![green field](/blog/replacing-hls-dash/green.jpg)
<figcaption>
[Source](https://www.freeimageslive.co.uk/free_stock_image/green-field-painting-jpg). Wow. That's quite the
green field.
</figcaption>
</figure>

## TCP

After my [previous blog post](/blog/replacing-webrtc), I had a few people hit up my DMs and claim they can do real-time latency with TCP.
And I'm sure a few more people will too after this post, so you get your own section that muddles the narrative.

Yes, you can do real-time latency with TCP (or WebSockets) under ideal conditions.

However, it just won't work well enough on poor networks.
Congestion and buffer-bloat will absolutely wreck your protocol on poor networks.
A lot of my time spent at Twitch was optimizing for the 90th percentile; the shoddy cellular networks in Brazil or India or Australia.
If all of your customers have a flawless internet connection, or live inside your data center, then TCP is absolutely the great choice.

But if you are going to reinvent RTMP, there are [some ways to reduce queuing](https://www.youtube.com/watch?v=cpYhm74zp0U) but they are quite limited.
This is _especially_ true in a browser environment when limited to HTTP or [WebSockets](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API).

See my next blog post about **Replacing RTMP**.

## HTTP

Expand All @@ -163,67 +210,49 @@ The latency floor is lower, but the latency ceiling is still just as high, and y

<figure>
![buffering](/blog/replacing-hls-dash/buffering.gif)
<figcaption>> tfw LL-HLS/LL-DASH</figcaption>
<figcaption>\> tfw LL-HLS/LL-DASH</figcaption>
</figure>

But we're also approaching the limit of what you can do with HTTP semantics.

- **LL-HLS** can be configured to make 20 sequential HTTP requests per second _per track_, all of which are in the latency critical path. And it still manages to add +100ms of latency making it untenable for real-time.
- **LL-DASH** can deliver frame-by-frame with chunked-transfer, but it adds overhead and absolutely wrecks client-side ABR algorithms. [Twitch hosted a challenge](https://blog.twitch.tv/en/2020/01/15/twitch-invites-you-to-take-on-our-acm-mmsys-2020-grand-challenge/) but I'm convinced it's impossible without server feedback.

I do want to give a special shoutout to [HESP](https://www.theoplayer.com/solutions/hesp-high-efficiency-streaming).
I do want to give a special shout-out to [HESP](https://www.theoplayer.com/solutions/hesp-high-efficiency-streaming).
It works by canceling HTTP requests during congestion and frankensteining the video encoding which is quite clever, but suffers the same fate.

We've hit a wall with HTTP over TCP.

## TCP

After my [previous blog post](/blog/replacing-webrtc), I had a few people hit up my DMs and claim they can do real-time latency with TCP.
And I'm sure a few more people will too after this post.

Yes, you can do real-time latency with TCP (or WebSockets) under ideal conditions.

However, it just won't work well enough on poor networks.
Congestion and buffer-bloat will absolutely wreck your protocol on poor networks.
A lot of my time spent at Twitch was optimizing for the 90th percentile; the shoddy cellular networks in Brazil or India or Australia.
If all of your customers have a flawless internet connection, or live inside your datacenter, then TCP is absolutely the great choice.

But if you are going to reinvent RTMP, there are [some ways to reduce queuing](https://www.youtube.com/watch?v=cpYhm74zp0U) but they are quite limited.
This is _especially_ true in a browser environment when limited to HTTP or [WebSockets](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API).

See my next blog post about **Replacing RTMP**.

## HTTP/3

If you're an astute networking afficionado, you might have realized that [HTTP/3](https://www.cloudflare.com/learning/performance/what-is-http3) uses [QUIC](https://www.rfc-editor.org/rfc/rfc9000.html) instead of TCP.
If you're an astute networking aficionado, you might have realized that [HTTP/3](https://www.cloudflare.com/learning/performance/what-is-http3) uses [QUIC](https://www.rfc-editor.org/rfc/rfc9000.html) instead of TCP.
We can replace any mention of ~~TCP~~ with QUIC, problem solved!

Well, not quite.

To use another complicated topic as a metaphor:
Well, not quite. To use another complicated topic as a metaphor:

- A TCP connection is a single-core CPU.
- A QUIC connection is a multi-core CPU.

If you take a single threaded program and run it on a multi-core machine, it will run just as slow, and perhaps even slower.
This is the case with HLS/DASH as each segment request is made _sequentially_: one after the other.
This is the case with HLS/DASH as each segment request is made _sequentially_.

Why not utilize multiple TCP connections?
Well it's a great idea and that's how browsers work with HTTP/1.1.
However, each one involves an expensive TCP/TLS handshake and they all fight for limited bandwidth.

<center>The key to using QUIC is to **embrace concurrency**.</center>
<p class="tagline">The key to using QUIC is to embrace concurrency</p>

This means utilizing multiple prioritized but otherwise independent streams over the same connection.
Think of it like using `nice` on Linux to (de)prioritize a process.
This means utilizing multiple, independent streams that share a connection.
You can prioritize a stream so it gets more bandwidth during congestion, much like you can use `nice` on Linux to prioritize a process when CPU starved.
If a stream is taking too long, you can cancel it much like you can `kill` a process.

For live media, you want to prioritize new media over old media.
You also want to prioritize audio over video so you can hear what someone is saying, without necessarily seeing their lips move.
For live media, you want to prioritize new media over old media, since it's okay to skip old content.
You also want to prioritize audio over video, so you can hear what someone is saying without necessarily seeing their lips move.
If you can only transmit 50% of media stream, make sure it's the most important 50%.

To Apple/Pantos' credit, LL-HLS is exploring [prioritization using HTTP/3](https://mailarchive.ietf.org/arch/msg/hls-interest/RcZ2SG8Sz_zZEcjWnDKzcM_-TJk/).
It doesn't go far enough (yet!) and HTTP semantics get in the way, but it's absolutely the right direction.
I'm conviced that somebody will make a [HTTP/3 only media protocol](https://mailarchive.ietf.org/arch/msg/moq/S3eOPU5XnvQ4kn1zJyDThG5U4sA/) at some point.
I'm convinced that somebody will make a [HTTP/3 only media protocol](https://mailarchive.ietf.org/arch/msg/moq/S3eOPU5XnvQ4kn1zJyDThG5U4sA/) at some point.

But of course I'm biased towards...

Expand All @@ -236,7 +265,13 @@ Well, there are some important differences between Media over QUIC and your stan

## Reason 0: Utilize QUIC

QUIC is the future of the internet and TCP is a relic of the past.
QUIC is the future of the internet.
TCP is a relic of the past.

<figure>
<img src="/home/quic.svg" className="m-4 inline h-24" alt="QUIC Logo" />
<figcaption>You're going to see a lot of this logo, although not crudely traced or green.</figcaption>
</figure>

It's a **bold** claim I know.
But I struggle to think of a single reason why you would use TCP over QUIC going forward.
Expand All @@ -251,7 +286,7 @@ It might not be obvious is that HTTP/3 is actually a thin layer on top of QUIC.
Likewise MoQ is also meant to be a thin layer on top of QUIC, effectively just providing pub/sub semantics.
We get all of the benefits of QUIC without the baggage of HTTP, and yet still have web support via [WebTransport](https://developer.mozilla.org/en-US/docs/Web/API/WebTransport_API).

We can focus on the important stuff instead: **live media**.
Instead we can focus on the important stuff instead: **live media**.

## Reason 1: Designed for Relays

Expand All @@ -265,11 +300,17 @@ The application splits data into "objects", annotated with a header providing si
These are generic signals, including stuff like the priority, reliability, grouping, expiration, etc.

MoqTransport is designed to be used for arbitrary applications.
For example, end-to-end encrypted chat messages, game state updates, live playlists, or even a clock!
For example:

- live chat
- end-to-end encryption
- game state
- live playlists
- or even a clock!

This is huge draw for CDN vendors.
Instead of building a custom WebRTC CDN that targets one specific niche, you can cast a much wider net with MoqTransport.
Individuals from Akamai, Google, and Cloudflare have been involved in the standardiziation process thus far.
Individuals from Akamai, Google, and Cloudflare have been involved in the standardization process thus far.

## Reason 2: Media Layer

Expand All @@ -289,7 +330,12 @@ There's a lot of cool ideas floating around, such as a [live playlist format](ht

## Reason 3: IETF

[Media over QUIC is an IETF working group](https://datatracker.ietf.org/wg/moq/about/).
Media over QUIC is an [IETF working group](https://datatracker.ietf.org/wg/moq/about/).

<figure>
<img src="/home/ietf.svg" className="m-4 inline h-24" alt="IETF Logo" />
<figcaption>I crudely traced and recolored this logo too.</figcaption>
</figure>

If you know nothing about the IETF, just know that it's the standards body behind favorites such as HTTP, DNS, TLS, QUIC, and even WebRTC.
But I think [this part](https://www.ietf.org/about/introduction/) is especially important:
Expand All @@ -312,7 +358,7 @@ What can't you use it today to replace HLS/DASH?

We'll get there eventually.

Feel free to use our [Rust](https://github.com/kixelated/moq-rs) or [Typescript](https://github.com/kixelated/moq-js) implementation is you want to experiement.
Feel free to use our [Rust](https://github.com/kixelated/moq-rs) or [Typescript](https://github.com/kixelated/moq-js) implementation is you want to experiment.
Join the [Discord](https://discord.gg/FCYF3p99mr) if you want to help!

Written by [@kixelated](https://github.com/kixelated).

0 comments on commit 9eb7033

Please sign in to comment.