-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RFC: WebRTC Simulcast #55
base: master
Are you sure you want to change the base?
Conversation
I think that we can do improve the In Given that Simulcast is negotiated on the SDP O/A, we could even enable simulcast always, and depending if the server supports it or not, start the lower encodings as needed. Simulcast is a very important feature for us, and if implemented, it would allow us to deprecate our OBS-webrtc fork and focus on contributing to the main OBS instead. So please, just let me know what can I do in order to meet all the requirements both in the RFC and in the implementation. |
A few initial questions:
|
thanks for your feedback @Warchamp7 ! regarding your questions:
|
As someone a bit less technical, can you elaborate on this? That sounds to me like the info is transmitted upon session start. If that's the case, then are users expected to configure X many layers and simply be hit with an error on session start if it's too many for the selected service? Ideally a user can select their service/endpoint, and be presented with information on how many layers they can configure, and any restrictions/recommendations the service might have. I very much don't like the idea of users having to simply set things up and hope it'll work. Worst case scenario we may have to hardcode limits into services.json with our other service recommendations.
My concern is with detecting "doesn't support". Most (all?) NVIDIA consumer cards have a hard limit of 5 simultaneous sessions, but the realistic limit can be less than that, based on the demands of the sessions. Similarly for AMF, whereas they don't have a session limit, I think you'll struggle to get more than 2 or 3. I do not believe there is a way to detect available sessions or resources, it will simply fail when they attempt to start their output. Performing silent fallback to software encoding could lead to an unexpected performance impact every time they begin an output if less sessions/resources are available one time vs another. When we are only spinning up a single encoder, this is a binary problem. It either works or it doesn't. The introduction of layers means that on any given day and system usage, 3 layers might work some times but not others. I want to make sure we are adequately able to communicate issues to users and have proper error handling to solve them. |
The simulcast negotiation in the SDP is described in detail here: TL;DR; the client sends an offer with a simulcast attribute and the rids(encodings) that they want to send
and the server accepts them reversing the send/recv
If the server does not accept simulcast, it will not include the simulcast attribute and the client will just send one encoding as normally. In theory, the client could also specify the video encoding properties in the offer and the server accept the ones they want, but in reality the server always accepts everything that is sent from the client. Regarding the maximum number of layers, there would be not a problem not sending all the encodings offered from OBS. Webrtc servers are already used to have dinamic number of inputs, as browsers may drop (stop sending) simulcast layers based on the cpu/bandwith use. |
Hey @Warchamp7 I coded up a implementation of this if you want to try it out! Sean-Der/obs-studio#2 It adds a check box to enable/disable Simulcast You can use it against
You will have a drop down to switch between your different quality levels. For quicker switching between layers you can set the
I personally think a simple checkbox is enough for a first version. In the future I would like to see an advanced mode where more can be configured. In the vast majority of cases I think streamers want uniformity.
This is discovered at connect. My plan is to disconnect/reject users who have configured their client incorrectly. In their stream manager view they will get a notification why. I want to handle this the same way as a user sending excessive bitrate. A Open Source book on how WebRTC works is available if you are curious about the details! WebRTC for the Curious. If you have any specific questions I would love to answer them :)
Why do you believe this will be a significant performance cost? If you do conferencing in your browser you have used Simulcast (Hangouts, Jitsi...). LiveKit wrote an article about how the industry sees it. On my local machine my CPU usage goes from 5% -> 8% with Simulcast enabled with x264.
What does OBS do today with encoding/scaling/compositing costs are high? Do we have any automated tools that adjust configurations/help users debug? I don't think Simulcast is a unique situation. The existing situation isn't binary either. The performance of a single encoder is influenced by what you are encoding, how much you are encoding and the settings you are using. |
Thanks for the great suggestions. We will be working on supporting this feature in our products. Since you are here, please allow me to join the discussion. Simulcast is a WebRTC (WHIP) specific feature, so I think some people may feel uncomfortable if it is in the "Streaming" section. I think it would be easier to understand if a checkbox for Simulcast is provided in the "WHIP" setting section, since I think it is a setting for whether or not a=simulcast is included in the client's Offer. |
Great suggestion @voluntas! I have moved it. New builds from my PR now have it on the |
Hey @Sean-Der, saw your post on LinkedIn and wanted to try this great new implementation. OBS seems to stream to the server, but unfortunately I just see a spinning wheel... But somehow the simulcast got identified because the quality level option get displayed. Any suggestions how I can further debug this issue ? |
This is an RFC, not a place to post for support, nor should this be considered an implementation that is ready for actual testing past the design in OBS at this stage. Please do not solicit support feedback on this RFC, only design. |
Hey @chhofi I would love to help! Mind moving conversation to Sean-Der/obs-studio#2? |
@Fenrirthviti all right. Thanks for the clarification. @Sean-Der Sure, thx :) |
@Sean-Der Perfect! Wonderful. |
cf8708d
to
b309076
Compare
Btw, the screenshot above has been taken without modifications from this repo code https://github.com/amazon-contributing/upstreaming-to-obs-studio/tree/30.0.2-enhanced-broadcasting-v11 |
Hi @Sean-Der apologies for the delay following up here. Given the work that's happened around Multitrack Video / Twitch Enhanced Broadcasting, I'm more inclined to support this with the more limited UI. With that said, the current implementation of MV / TEB is what I'd consider a 'bare minimum' and a standard I would like to hold simulcast to. In general this feature should mimic a lot of what is in place for Multitrack Video, and whenever feasible share/refactor code for that. The biggest thing missing in the current proposal then is the configuration of the layer settings. Hardcoding it to 2 layers with specific values is great for testing the functionality but not acceptable for an MVP. There either needs to be UI for configuring everything or information provided by the server that is then used to configure the client. Either as a side-channel HTTP request or via WHIP itself. MV / TEB is once again a great example for this.
OBS can be capped to 2 layers as a sane limit, but it should support negotiation of "up to" that amount, not be specifically set to that. |
@Warchamp7 it doesn’t need to be capped to 2! WHIP doesn’t allow the server to control the users computer. The server is allowed to reject the users offer though! The 50%/25% is the default behavior in JavaScript/Browsers today. When I spoke with users it was ‘least surprising’ to match those APIs. Can it be a checkbox and also allow JSON input? The JSON can match TEB and would allow anything users want. Where does the advanced JSON input go? |
Is there no method by which the server can communicate what it would accept or recommend? |
there is a whole rfc for that, but none has actually implemented it. From a server point of view, it is more of a matter if simulcast is supported or not and not about the encoding config of each of the layers. Having OBS choosing the encoding config for each simulcast layer is the expected behavior for majority of the users. Just being able to set the number of layers to adjust the cpu/gpu usage would be good to have. |
I haven't seen it implemented either. At Twitch/IVS we disconnect the user if they exceed, but they are free to configure w/e they like in those constraints. I believe I found the relevant RFC here for a server to communicate restrictions. This is after the client has made the offer. So it isn't a Accept/Recommend you can query ahead of time. You make the Simulcast offer. The server will then say 'We accept your N layers, and adjust them like so' It would be great to support that someday. At this time I don't know a single provider (or Open Source server) that does that yet though. |
I'm asking about before the client has made the offer. This shouldn't be part of the SDP negotiation. There is a strong desire from services I've spoken with in the past to be able to advertise to a client what they are permitted to send, especially when authentication is factored in to allow different capabilities to different users. Either a stateless request as part of WHIP before the SDP offer, or a separate endpoint altogether akin to MV with I understand that servers will generally accept a firehose of streams but I don't think that's valid justification for making a client send whatever it desires. It's a waste of bandwidth to send layers that may not even be used and an opaque checkbox if turning it on for a particular service doesn't actually do anything. In a more traditional context where it's a web page or dedicated application for a service, this is implicitly the case because those app developers know what their server infrastructure will do with them. As a service agnostic client I don't believe we should be blindly sending additional feeds and hoping the server accepts it and/or will find them useful. It's in the best interests for both servers and users for there to be some method of communicating capabilities that should be considered here. Clients should not send 3 layers for the server to decide if it wants to do anything with them or not, the server should indicate it wants 3 layers because it can provide improved functionality with them. |
I empathize with 'things should be better', but I can't change things. I am just implementing an existing protocol. I want this output to work with all the WebRTC providers/services that exist today. Adding new things make it not WebRTC.
Is this WebRTC providers or is this conversations for TEB? Simulcast is used for Hangouts, FaceTime Web, Discord etc... and I haven't seen a desire for a 'configuration_url' expressed in the W3C or IETF. We can communicate capabilities, the roles are just switched. OBS will offer 3 layers, the server can then reject one. I am not against improving things! I just don't have much flexibility on the protocol side. I will do anything I can to express WebRTC concepts/make it intuitive in OBS though. |
It's most services I've interacted with one the years. All before TEB was even an idea.
All of these examples have the server and client application controlled by the same party. I cannot send a WebRTC feed to Hangouts or Discord using OBS or any other third party client. In a scenario where that WAS possible, it would be extremely easy to have a misconfiguration. The solution to that is a way to communicate capabilities and requirements so that they can be handled by the app or presented to a user in a sane fashion.
This is fine behavior from the server perspective for handling of what is essentially malformed requests. It should not be the behavior of a client to send a request and hope for the best.
From my reading and observing the other discussions that is indeed the hurdle to be crossed for webrtc to be viable as a client-facing protocol. Either the protocol needs a better grasp of this exchange or it needs to be handled separately like Multitrack Video. It's my opinion that this is a necessity for an uncontrolled third party client to peacefully interface with an arbitrary server. No one would ever expect a file upload API to simply reject files above a certain file size and also not provide a way to get restrictions for example. OBS is a service-neutral third party client, and in the case of WebRTC may very well be the true first of that nature. There are going to be questions to solve as a consequence of that. I understand the frustration knowing that many servers will simply accept the additional layers or fail gracefully and it seems like a free win for the folks you're working with but it's not enough justification for a blind/optimistic hardcoded implementation on the OBS end. |
speaking as a service operator here, I don't impose any restrictions or have any expectations on what the customers decide to send too me. We have recommended settings for obs, but they are just that, recommendations. speaking as the WHIP author, I have not received a single feedback/request in that regard, so that's why it is not why the protocol doesn't not have any capability for that. I would preffer to have something simple working and start reviving feedback (or not) about the encoding settings to iterate later |
Discord/Facetime don't control the browser. They do a Offer/Answer exchange and they both implement the protocol. Both sides have error handling when the negotiations fail though.
OBS isn't the first WHIP client.
The behavior of WebRTC negotiation isn't like that.
In OBS i click |
Echoing @murillo128 As a service operator (Twitch) this is how I would like it to work for Guest Star. I don’t want any new proprietary APIs. Keeping FFMPEG/Larix compatibility is important to me In the PR we have also gotten reviews from most major service (Cloudflare, LiveKit, Red5 and LivePeer) so I feel pretty confident this is the right path.
|
@Warchamp7 What's your conclusion on this? Do you see any path forward for protocols that don't have a 'pre-connect configuration' in the protocol in OBS? |
a6065cd
to
f51a0a2
Compare
I have updated the RFC to address the following
@Warchamp7 mind looking again when you get a chance? thank you |
f51a0a2
to
68e7d44
Compare
I hope this merge finds it's way to OBS master branch. Simulcast support can liberate streaming in general, allowing anyone to operate a streaming service with ABR, without expensive hardware / GPUs / cloud. |
Summary
Add Simulcast support to WebRTC output.
Simulcast is a WebRTC protocol feature that allows a uploader to send layers of one track. This is built into the protocol so every WebRTC
ingest already understands this. These layers can be different resolutions, bitrates and even codecs.
Motivation
Live streaming services offer videos at multiple bitrates and resolutions. This is needed to support the wide variety of connections that users will have.
Today streaming services decode the incoming video, modify and then re-encode to generate these different quality levels. This has some draw backs that Simulcast
will fix or improve.
Generation Loss - Decoding and re-encoding videos causes generation loss. Simulcast means encodes come from the source video which will be higher quality.
Higher Quality Encodes - Streamers with dedicated hardware can provide higher quality encodes. Streaming services at scale are optimizing for cost.
Lower Latency - Removing the additional encoding/decoding allows video to be delivered to users faster.
Reduce server complexity - Users find it difficult to setup RTMP->HLS with transcodes. With Simulcast setting up a streaming server becomes dramatically easier.
Link to RFC