Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC: WebRTC Simulcast #55

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Sean-Der
Copy link

@Sean-Der Sean-Der commented Jul 3, 2023

Summary

Add Simulcast support to WebRTC output.

Simulcast is a WebRTC protocol feature that allows a uploader to send layers of one track. This is built into the protocol so every WebRTC
ingest already understands this. These layers can be different resolutions, bitrates and even codecs.

Motivation

Live streaming services offer videos at multiple bitrates and resolutions. This is needed to support the wide variety of connections that users will have.
Today streaming services decode the incoming video, modify and then re-encode to generate these different quality levels. This has some draw backs that Simulcast
will fix or improve.

  • Generation Loss - Decoding and re-encoding videos causes generation loss. Simulcast means encodes come from the source video which will be higher quality.

  • Higher Quality Encodes - Streamers with dedicated hardware can provide higher quality encodes. Streaming services at scale are optimizing for cost.

  • Lower Latency - Removing the additional encoding/decoding allows video to be delivered to users faster.

  • Reduce server complexity - Users find it difficult to setup RTMP->HLS with transcodes. With Simulcast setting up a streaming server becomes dramatically easier.


Link to RFC

@murillo128
Copy link

I think that we can do improve the Basic behavior of simulcast.
By default it would be better to use always the top layer as it is configured in OBS when using simulcast, and them downscale the other two simulcast layers by 1.5 and 2 in dimensions and by 2 and 4 in bitrate. So for example, if we have 1080p 6mbps, we would have 720p at 3mbps and 540p at 1.5mbps, and if we are using 720p at 3mbps, we have 480p at 1.5mbps and 360p at 750kbps. This is how Chrome has been working for several years until more advanced apis were available to control the individual layer encoder parameters.

In Advanced config, the user should be able to configure the number of simulcast layers and width/height/fps and bitrate of the lower simulcast layers.

Given that Simulcast is negotiated on the SDP O/A, we could even enable simulcast always, and depending if the server supports it or not, start the lower encodings as needed.

Simulcast is a very important feature for us, and if implemented, it would allow us to deprecate our OBS-webrtc fork and focus on contributing to the main OBS instead. So please, just let me know what can I do in order to meet all the requirements both in the RFC and in the implementation.

@Warchamp7
Copy link
Member

A few initial questions:

  • What needs to be able to be configured on a layer for this to be valuable both for users and services? Resolution, bitrate, anything else?

  • Do we need a way to limit or recommend layer settings per users based on service? Or are services expected to serve whatever layers they receive?

  • Each layer will need an additional encoder spun up to be able to do different resolution, bitrate, and whatever else we allow configuration for. This is a significant performance cost we will need to make clear to users for one thing. More importantly though is that to my knowledge hardware encoders are limited in the number of sessions they can run, and there is no API or method to derive that information. This is a problem that will need to be solved, potentially with work from NVIDIA/AMD

@murillo128
Copy link

thanks for your feedback @Warchamp7 !

regarding your questions:

  • Resolution and bitrate are enough, fps could be marginally useful.

  • The server support for simulcast is negotiated on the SDP. The number of layer can also be negotiated, although typically the servers accepts whatever the client publish. On webrtc browsers the maximum number of simulcast layers is 3, so I would expect some issues when sending more than 3 layers, but nothing the media servers won't be able to adapt to. FWIW on dobly.io/millicast we accept whatever number of simulcast layers offered by the client.

  • As said before, even just starting with 3 layers (which I think should be supported by most GPUs) , would be a huge success. If the user wants to send more than 3, and the GPU doesn't support them, we could use SW encoding for the lower layers instead, which should not consume as much cpu as the high layers.

@Warchamp7
Copy link
Member

  • The server support for simulcast is negotiated on the SDP. The number of layer can also be negotiated, although typically the servers accepts whatever the client publish.

As someone a bit less technical, can you elaborate on this? That sounds to me like the info is transmitted upon session start. If that's the case, then are users expected to configure X many layers and simply be hit with an error on session start if it's too many for the selected service?

Ideally a user can select their service/endpoint, and be presented with information on how many layers they can configure, and any restrictions/recommendations the service might have.

I very much don't like the idea of users having to simply set things up and hope it'll work. Worst case scenario we may have to hardcode limits into services.json with our other service recommendations.

  • As said before, even just starting with 3 layers (which I think should be supported by most GPUs) , would be a huge success. If the user wants to send more than 3, and the GPU doesn't support them, we could use SW encoding for the lower layers instead, which should not consume as much cpu as the high layers.

My concern is with detecting "doesn't support". Most (all?) NVIDIA consumer cards have a hard limit of 5 simultaneous sessions, but the realistic limit can be less than that, based on the demands of the sessions. Similarly for AMF, whereas they don't have a session limit, I think you'll struggle to get more than 2 or 3. I do not believe there is a way to detect available sessions or resources, it will simply fail when they attempt to start their output. Performing silent fallback to software encoding could lead to an unexpected performance impact every time they begin an output if less sessions/resources are available one time vs another.

When we are only spinning up a single encoder, this is a binary problem. It either works or it doesn't. The introduction of layers means that on any given day and system usage, 3 layers might work some times but not others. I want to make sure we are adequately able to communicate issues to users and have proper error handling to solve them.

@murillo128
Copy link

The simulcast negotiation in the SDP is described in detail here:
https://www.rfc-editor.org/rfc/rfc8853.html

TL;DR; the client sends an offer with a simulcast attribute and the rids(encodings) that they want to send

a=rid:0 send
a=rid:1 send
a=rid:2 send
a=simulcast:send 0;1;2

and the server accepts them reversing the send/recv

a=rid:0 recv
a=rid:1 recv
a=rid:2 recv
a=simulcast:recv 0;1;2

If the server does not accept simulcast, it will not include the simulcast attribute and the client will just send one encoding as normally.

In theory, the client could also specify the video encoding properties in the offer and the server accept the ones they want, but in reality the server always accepts everything that is sent from the client.

Regarding the maximum number of layers, there would be not a problem not sending all the encodings offered from OBS. Webrtc servers are already used to have dinamic number of inputs, as browsers may drop (stop sending) simulcast layers based on the cpu/bandwith use.

@Sean-Der
Copy link
Author

Sean-Der commented Jul 9, 2023

Hey @Warchamp7 I coded up a implementation of this if you want to try it out! Sean-Der/obs-studio#2

It adds a check box to enable/disable Simulcast
image

You can use it against

You will have a drop down to switch between your different quality levels. For quicker switching between layers you can set the Custom Encoder Settings of keyint=30 aq-mode=0 subme=0 no-deblock sync-lookahead=3. This should be handled better server side, but I am trying to keep Broadcast Box as simple as possible.


What needs to be able to be configured on a layer for this to be valuable both for users and services? Resolution, bitrate, anything else?

I personally think a simple checkbox is enough for a first version. In the future I would like to see an advanced mode where more can be configured. In the vast majority of cases I think streamers want uniformity.

Do we need a way to limit or recommend layer settings per users based on service? Or are services expected to serve whatever layers they receive?

This is discovered at connect. My plan is to disconnect/reject users who have configured their client incorrectly. In their stream manager view they will get a notification why. I want to handle this the same way as a user sending excessive bitrate.

A Open Source book on how WebRTC works is available if you are curious about the details! WebRTC for the Curious. If you have any specific questions I would love to answer them :)

This is a significant performance cost we will need to make clear to users for one thing. This is a problem that will need to be solved, potentially with work from NVIDIA/AMD

Why do you believe this will be a significant performance cost? If you do conferencing in your browser you have used Simulcast (Hangouts, Jitsi...). LiveKit wrote an article about how the industry sees it.

On my local machine my CPU usage goes from 5% -> 8% with Simulcast enabled with x264.

This is a significant performance cost we will need to make clear to users for one thing. When we are only spinning up a single encoder, this is a binary problem.

What does OBS do today with encoding/scaling/compositing costs are high? Do we have any automated tools that adjust configurations/help users debug? I don't think Simulcast is a unique situation. The existing situation isn't binary either. The performance of a single encoder is influenced by what you are encoding, how much you are encoding and the settings you are using.

@voluntas
Copy link

voluntas commented Jul 13, 2023

Thanks for the great suggestions. We will be working on supporting this feature in our products. Since you are here, please allow me to join the discussion.

Simulcast is a WebRTC (WHIP) specific feature, so I think some people may feel uncomfortable if it is in the "Streaming" section.

I think it would be easier to understand if a checkbox for Simulcast is provided in the "WHIP" setting section, since I think it is a setting for whether or not a=simulcast is included in the client's Offer.

@Sean-Der
Copy link
Author

Great suggestion @voluntas!

I have moved it. New builds from my PR now have it on the Stream tab.
Screenshot 2023-07-13 at 10 43 27 AM

@chhofi
Copy link

chhofi commented Jul 13, 2023

Hey @Sean-Der, saw your post on LinkedIn and wanted to try this great new implementation. OBS seems to stream to the server, but unfortunately I just see a spinning wheel... But somehow the simulcast got identified because the quality level option get displayed. Any suggestions how I can further debug this issue ?
Bildschirmfoto 2023-07-13 um 23 24 39

@Fenrirthviti
Copy link
Member

Hey @Sean-Der, saw your post on LinkedIn and wanted to try this great new implementation. OBS seems to stream to the server, but unfortunately I just see a spinning wheel... But somehow the simulcast got identified because the quality level option get displayed. Any suggestions how I can further debug this issue ?

This is an RFC, not a place to post for support, nor should this be considered an implementation that is ready for actual testing past the design in OBS at this stage.

Please do not solicit support feedback on this RFC, only design.

@Sean-Der
Copy link
Author

Hey @chhofi

I would love to help! Mind moving conversation to Sean-Der/obs-studio#2?

@chhofi
Copy link

chhofi commented Jul 14, 2023

@Fenrirthviti all right. Thanks for the clarification. @Sean-Der Sure, thx :)

@voluntas
Copy link

@Sean-Der Perfect! Wonderful.

@Sean-Der Sean-Der force-pushed the webrtc-simulcast branch 2 times, most recently from cf8708d to b309076 Compare January 10, 2024 16:51
@murillo128
Copy link

The UX/UI could be very similar to what twitch is proposing for its enhanced streaming:

image, by setting a maximum bitrate and a number of reserved encoded instances.

The SDP offer will use the number of reserved encoded instances and the server will be able to restrict the number on the sdp answer which will be the final number of encoder instances to use.

@murillo128
Copy link

Btw, the screenshot above has been taken without modifications from this repo code https://github.com/amazon-contributing/upstreaming-to-obs-studio/tree/30.0.2-enhanced-broadcasting-v11

@Warchamp7
Copy link
Member

Hi @Sean-Der apologies for the delay following up here.

Given the work that's happened around Multitrack Video / Twitch Enhanced Broadcasting, I'm more inclined to support this with the more limited UI. With that said, the current implementation of MV / TEB is what I'd consider a 'bare minimum' and a standard I would like to hold simulcast to. In general this feature should mimic a lot of what is in place for Multitrack Video, and whenever feasible share/refactor code for that.

The biggest thing missing in the current proposal then is the configuration of the layer settings. Hardcoding it to 2 layers with specific values is great for testing the functionality but not acceptable for an MVP. There either needs to be UI for configuring everything or information provided by the server that is then used to configure the client. Either as a side-channel HTTP request or via WHIP itself.

MV / TEB is once again a great example for this.

"framerate": {
    "denominator": 1,
    "numerator": 60
},
"gpu_scale_type": "OBS_SCALE_BICUBIC",
"height": 936,
"settings": {
    "bf": 3,
    "bitrate": 6000,
    "keyint_sec": 2,
    "lookahead": true,
    "preset2": "p6",
    "profile": "high",
    "psycho_aq": false,
    "rate_control": "CBR",
    "tune": "hq"
},
"type": "nvenc",
"width": 1664

OBS can be capped to 2 layers as a sane limit, but it should support negotiation of "up to" that amount, not be specifically set to that.

@Sean-Der
Copy link
Author

Sean-Der commented Aug 1, 2024

@Warchamp7 it doesn’t need to be capped to 2!

WHIP doesn’t allow the server to control the users computer. The server is allowed to reject the users offer though!

The 50%/25% is the default behavior in JavaScript/Browsers today. When I spoke with users it was ‘least surprising’ to match those APIs.

Can it be a checkbox and also allow JSON input? The JSON can match TEB and would allow anything users want.

Where does the advanced JSON input go?

@Warchamp7
Copy link
Member

The server is allowed to reject the users offer though!

Is there no method by which the server can communicate what it would accept or recommend?

@murillo128
Copy link

there is a whole rfc for that, but none has actually implemented it.

From a server point of view, it is more of a matter if simulcast is supported or not and not about the encoding config of each of the layers.

Having OBS choosing the encoding config for each simulcast layer is the expected behavior for majority of the users. Just being able to set the number of layers to adjust the cpu/gpu usage would be good to have.

@Sean-Der
Copy link
Author

Sean-Der commented Aug 2, 2024

I haven't seen it implemented either. At Twitch/IVS we disconnect the user if they exceed, but they are free to configure w/e they like in those constraints.

I believe I found the relevant RFC here for a server to communicate restrictions. This is after the client has made the offer. So it isn't a Accept/Recommend you can query ahead of time. You make the Simulcast offer. The server will then say 'We accept your N layers, and adjust them like so'

It would be great to support that someday. At this time I don't know a single provider (or Open Source server) that does that yet though.

@Warchamp7
Copy link
Member

I haven't seen it implemented either. At Twitch/IVS we disconnect the user if they exceed, but they are free to configure w/e they like in those constraints.

I believe I found the relevant RFC here for a server to communicate restrictions. This is after the client has made the offer. So it isn't a Accept/Recommend you can query ahead of time. You make the Simulcast offer. The server will then say 'We accept your N layers, and adjust them like so'

It would be great to support that someday. At this time I don't know a single provider (or Open Source server) that does that yet though.

I'm asking about before the client has made the offer. This shouldn't be part of the SDP negotiation.

There is a strong desire from services I've spoken with in the past to be able to advertise to a client what they are permitted to send, especially when authentication is factored in to allow different capabilities to different users. Either a stateless request as part of WHIP before the SDP offer, or a separate endpoint altogether akin to MV with multitrack_video_configuration_url. The existing work in OBS around the GoLiveAPI / MultitrackVideoAutoConfig support is intentionally not service specific.

I understand that servers will generally accept a firehose of streams but I don't think that's valid justification for making a client send whatever it desires. It's a waste of bandwidth to send layers that may not even be used and an opaque checkbox if turning it on for a particular service doesn't actually do anything. In a more traditional context where it's a web page or dedicated application for a service, this is implicitly the case because those app developers know what their server infrastructure will do with them. As a service agnostic client I don't believe we should be blindly sending additional feeds and hoping the server accepts it and/or will find them useful.

It's in the best interests for both servers and users for there to be some method of communicating capabilities that should be considered here. Clients should not send 3 layers for the server to decide if it wants to do anything with them or not, the server should indicate it wants 3 layers because it can provide improved functionality with them.

@Sean-Der
Copy link
Author

Sean-Der commented Aug 2, 2024

I empathize with 'things should be better', but I can't change things. I am just implementing an existing protocol. I want this output to work with all the WebRTC providers/services that exist today. Adding new things make it not WebRTC.

services I've spoken with in the past

Is this WebRTC providers or is this conversations for TEB? Simulcast is used for Hangouts, FaceTime Web, Discord etc... and I haven't seen a desire for a 'configuration_url' expressed in the W3C or IETF.

We can communicate capabilities, the roles are just switched. OBS will offer 3 layers, the server can then reject one.

I am not against improving things! I just don't have much flexibility on the protocol side. I will do anything I can to express WebRTC concepts/make it intuitive in OBS though.

@Warchamp7
Copy link
Member

services I've spoken with in the past

Is this WebRTC providers or is this conversations for TEB?

It's most services I've interacted with one the years. All before TEB was even an idea.

Simulcast is used for Hangouts, FaceTime Web, Discord etc... and I haven't seen a desire for a 'configuration_url' expressed in the W3C or IETF.

All of these examples have the server and client application controlled by the same party. I cannot send a WebRTC feed to Hangouts or Discord using OBS or any other third party client.

In a scenario where that WAS possible, it would be extremely easy to have a misconfiguration. The solution to that is a way to communicate capabilities and requirements so that they can be handled by the app or presented to a user in a sane fashion.

We can communicate capabilities, the roles are just switched. OBS will offer 3 layers, the server can then reject one.

This is fine behavior from the server perspective for handling of what is essentially malformed requests. It should not be the behavior of a client to send a request and hope for the best.

I am not against improving things! I just don't have much flexibility on the protocol side. I will do anything I can to express WebRTC concepts/make it intuitive in OBS though.

From my reading and observing the other discussions that is indeed the hurdle to be crossed for webrtc to be viable as a client-facing protocol. Either the protocol needs a better grasp of this exchange or it needs to be handled separately like Multitrack Video. It's my opinion that this is a necessity for an uncontrolled third party client to peacefully interface with an arbitrary server.

No one would ever expect a file upload API to simply reject files above a certain file size and also not provide a way to get restrictions for example.

OBS is a service-neutral third party client, and in the case of WebRTC may very well be the true first of that nature. There are going to be questions to solve as a consequence of that. I understand the frustration knowing that many servers will simply accept the additional layers or fail gracefully and it seems like a free win for the folks you're working with but it's not enough justification for a blind/optimistic hardcoded implementation on the OBS end.

@murillo128
Copy link

speaking as a service operator here, I don't impose any restrictions or have any expectations on what the customers decide to send too me. We have recommended settings for obs, but they are just that, recommendations.

speaking as the WHIP author, I have not received a single feedback/request in that regard, so that's why it is not why the protocol doesn't not have any capability for that.

I would preffer to have something simple working and start reviving feedback (or not) about the encoding settings to iterate later

@Sean-Der
Copy link
Author

Sean-Der commented Aug 2, 2024

All of these examples have the server and client application controlled by the same party

Discord/Facetime don't control the browser. They do a Offer/Answer exchange and they both implement the protocol. Both sides have error handling when the negotiations fail though.

OBS is a service-neutral third party client, and in the case of WebRTC may very well be the true first of that nature.

OBS isn't the first WHIP client.

No one would ever expect a file upload API to simply reject files above a certain file size and also not provide a way to get restrictions for example.

The behavior of WebRTC negotiation isn't like that.

  • The Offer/Answer is instant. It isn't like a file upload which takes time to fail
  • The Answer contains exactly what the problem is.

In OBS i click Publish and then I could put up a window You offered 5 layers, the server only accepts 3 at most.

@Sean-Der
Copy link
Author

Sean-Der commented Aug 2, 2024

Echoing @murillo128

As a service operator (Twitch) this is how I would like it to work for Guest Star. I don’t want any new proprietary APIs. Keeping FFMPEG/Larix compatibility is important to me

In the PR we have also gotten reviews from most major service (Cloudflare, LiveKit, Red5 and LivePeer) so I feel pretty confident this is the right path.

  • Start simple
  • In the future allow users to configure more as more is understood

@Sean-Der
Copy link
Author

Sean-Der commented Aug 6, 2024

@Warchamp7 What's your conclusion on this? Do you see any path forward for protocols that don't have a 'pre-connect configuration' in the protocol in OBS?

@Sean-Der Sean-Der force-pushed the webrtc-simulcast branch 5 times, most recently from a6065cd to f51a0a2 Compare August 9, 2024 15:55
@Sean-Der
Copy link
Author

Sean-Der commented Aug 9, 2024

I have updated the RFC to address the following

  • Better describe the motivations. I listed out the reasons I am working on this/why it would benefit users.
  • I improved the UI so it is less of a mystery what is happening. Users can adjust the layer count and they see exactly what will be sent.
  • I have called out that we throw an error on server rejection.
  • I have explicitly called out that no flow exists for 'Server Driven Configuration'

@Warchamp7 mind looking again when you get a chance? thank you

@mpisat
Copy link

mpisat commented Nov 29, 2024

I hope this merge finds it's way to OBS master branch. Simulcast support can liberate streaming in general, allowing anyone to operate a streaming service with ABR, without expensive hardware / GPUs / cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants