Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit server-sent offers #12

Open
englishm-ietf opened this issue Mar 30, 2023 · 8 comments
Open

Revisit server-sent offers #12

englishm-ietf opened this issue Mar 30, 2023 · 8 comments

Comments

@englishm-ietf
Copy link

englishm-ietf commented Mar 30, 2023

Picking up from discussion started in a video-dev Slack thread.

I think it's worth giving server-sent offers another consideration and maybe making that the one supported WHEP behavior rather than the other way around.

Discussion so far:

englishm
The more I think about this the more I think I prefer server offers… :thinking_face:

englishm
What if we shuffled the HTTP verbs around to match, too?

Sergio Garcia Murillo
server sent offers in WHEP have a problem with h264 profiles

englishm
Because you can’t simultaneously offer all possibilities a client might need to choose from?

Sergio Garcia Murillo
yep

englishm
😞

Sergio Garcia Murillo
same with vp9/av1 profiles

englishm
I haven’t tried this yet, but could you work around it by just offering more tracks with the various permutations of fmtps?

Sergio Garcia Murillo
tracks as "m-lines" ?

englishm
m= lines, yes

Sergio Garcia Murillo
i would just make things worse imo.. 🙂

englishm
Wait, no, you wouldn’t necessarily need more m= lines.. could you use dynamic format numbers and just define more “codecs” with the different format parameters using rtpmap to define them? Is it a problem if you define multiple payload numbers to all mean various versions of H264/90000 (or whatever)?

englishm
I’m pushing on this partly because I think server offers bring us a lot closer to the types of things DASH is looking for as an upfront description of available media. It seems backwards to have to duplicate that with something else because we have clients making recvonly offers instead.

Sergio Garcia Murillo
my main concern about server sent offers, is that in my case, I allow connecting viewers before the publication is started, so I don't know the actual codec that is going to be used

Sergio Garcia Murillo
so there is a huge risk that the client receiving a server sent offer will "choose" one of the codecs on the sdp offer, and send an answer without the other ones.

englishm
But you’d have to know by the time you answer and actually get ICE all connected, right?

Sergio Garcia Murillo
So it may be choosing h264, when the actual codec is vp9 later on

Sergio Garcia Murillo
no, you don't need

Sergio Garcia Murillo
in fact, i allow changing codecs mid session. I am even doing different codecs for different ABR layers.

englishm
Hm, so in your case the issue is that allowing the client to answer gives them too much control over the size of the envelope?

Sergio Garcia Murillo
yes, i think it is too risky for me

englishm
What about over-constrained client offers though, is that not the same risk?

Sergio Garcia Murillo
I can't do much in that regards, but I think devs would be more prone to restrict the answer based on the codecs offered by the server than send a constrained offer without codecs which are actually supported

englishm
Re-reading RFC8829 which is how W3C defines normative behavior for createAnswer…
from RFC8829 section-5.3.1:
If codec preferences have been set for the associated transceiver, media formats MUST be generated in the corresponding order, regardless of what was offered, and MUST exclude any codecs not present in the codec preferences.
Otherwise, the media formats on the “m=” line MUST be generated in the same order as those offered in the current remote description, excluding any currently unsupported formats. Any currently available media formats that are not present in the current remote description MUST be added after all existing formats.
In either case, the media formats in the answer MUST include at least one format that is present in the offer but MAY include formats that are locally supported but not present in the offer, as mentioned in [RFC3264], Section 6.1. If no common format exists, the “m=” section is rejected as described above.

englishm
To me, that sounds like browser implementations should be answering with all supported codecs unless specific codec preferences have been configured.

englishm
And we can provide explicit recommendations in the WHEP text for non-browser implementations to do the same.

englishm
Here’s the gist of what I’m imagining:
https://gist.github.com/englishm-ietf/48cbab582f8a748d8ebab0b2c47c9d5c

Sergio Garcia Murillo
GET requests should be idempotent and not cause state changes on the server side, but anyway it won't solve my issues with server sent offers

englishm
I think we don’t actually need to make state changes on the server until we get the answer though, is the realization I had.

Sergio Garcia Murillo
you have to allocate a new ice username/frag and create the candidates

englishm
Depends a little maybe on how you send ICE candidates, but I think the initial offer could maybe be generic. I guess we’d want to be careful about header caching for the session resource, too.

Sergio Garcia Murillo
but anyway, it is minor, using GET or empty POST is not an issue for me

englishm
You still think client answers will be overconstrained?

Sergio Garcia Murillo
I have already seen it in an early gstreamer implementation.. 🙂

englishm
But speaking of issues, maybe we should copy our discussion so far to a GitHub issue and pick it up there? I just realized that this isn’t the best medium for IETF discussion.

Sergio Garcia Murillo
i think it my be less risky to allow sever sent offers on whip instead

@redoPop
Copy link

redoPop commented Mar 30, 2023

From the quoted section of RFC8829:

media formats in the answer MUST include at least one format that is present in the offer

I think the phrasing "at least one" (i.e. not necessarily more than one) supports the overconstrained answer scenario. 😞

Would a smaller, non-SDP response satisfy this use case? e.g. a JSON payload broadly describing tracks and kinds, that the client can then use to create an appropriate offer?

@englishm-ietf
Copy link
Author

englishm-ietf commented Mar 31, 2023

I could see some implementations choosing to interpret RFC8829 that way, but I think that would be a mistake. In the section I quoted above it seems to me to be saying that every supported media format should be present in the answer, and the order should match what's in the offer, after which additional formats not present in the offer are to be listed.

the media formats on the “m=” line MUST be generated in the same order as those offered in the current remote description, excluding any currently unsupported formats. Any currently available media formats that are not present in the current remote description MUST be added after all existing formats.

The fact that currently available media formats MUST be added after the existing formats implies to me that available media formats listed in the offer should also be comprehensively represented.

Whether all implementations actually follow the spec here is something worth exploring, but this is what the text says, and I think at least browser implementations should be following it correctly.

The exception to these requirements is only if codec preferences have been set for the associated transceiver, but I don't think that should be dependent on the contents of the offer, so clients would have to fail in either case there.

Also, for what it's worth, I don't love the idea of adding a JSON payload and another round trip here. That seems like an unnecessary delay in startup time we can probably avoid.

@redoPop
Copy link

redoPop commented Mar 31, 2023

I agree that compliance is worth exploring here. If the problematic behavior is non-compliant then that makes a stronger case for reconsidering server-sent offers.

That said, I get what you're saying, but I don't think it's a necessary interpretation of the spec, or the intended one. The text you quoted applies specifically when adding new formats that weren't in the original offer. The broader implication that you're drawing conflicts with the more generally applicable "In either case… at least one format" phrasing in the following bullet.

It does seem compliant for an answer to include only one of the existing available media formats, whether or not codec preferences have been set. If the client were adding formats at the same time then a stronger argument could be made for non-compliance, but that isn't the specific scenario that drew concern.

@danjenkins
Copy link

Just here to add that I think this 100% needs to be solved, we cannot lose the ability to do whep -> whip - it will open up so many possibilities that weren't available before.

@mondain
Copy link

mondain commented Apr 19, 2023

Here's my two cents about the way I currently do what I'd refer to as WHEP Mode 1: https://gist.github.com/mondain/5bc8bee11af4b291abe154b39879e822

Obviously its simplified and I normally live in SFU/MCU world where we keep WHIP and WHEP; publishers and subscribers separate.

I WHIP'd this up real quick as well: https://gist.github.com/mondain/7a3792711c489e97e8cede9e5acbef50

@danjenkins
Copy link

danjenkins commented Mar 25, 2024

Coming back to an issue thats been open for a year and we're still in a position where existing WebRTC media servers can't support WHEP without re-architecting their existing WebRTC solution which wasn't the aim of WHEP. WHEP was meant to be a signalling layer on top of WebRTC and WebRTC didn't define who offers what and where... yes a signalling layer can... but I believe in this case it shouldn't stop the scenario of server sent offers.

Ultimately receiving media is a more complicated scenario than sending - the sender knows what is being sent and it agrees that with a server and off you go. With receiving media you have loads of permutations of what is possible; are you receiving media for multiple participants? Are you receiving multiple qualities? The media server knows the state of those things.... the client doesn't (in a very simple example)

Regardless of the above... unless WHEP supports server sent offers we are at risk of not having broad compatibility across existing WebRTC media servers. I understand a large reason for wanting client side offers is to be able to handle codec profiles... but theres no reason I can see as to why server side offers need to disappear... Receiving media is complicated - for something thats built on top of existing solutions... we need flexibility.

Would removing "WHIP -> WHEP" compatibility as a "wish" (see what I did there) help? I appreciate it was a lofty goal. Maybe thats something that could be looked into separately to this.

I would like to see both modes remain - I really don't see the harm in keeping both. Edit: Make the spec allow either mode and make it so you can call an OPTIONS request or something that will tell the client what mode a server supports. Make it so a client can only support one mode if it wants to. Yes you could get into a scenario where a client can't talk to a server, but at least this allows certain clients/products to only have to support their one preferred method.

@seanturner
Copy link

Before taking this discussion any farther, the chairs would like to see a concrete proposal submitted as a PR.

@danjenkins
Copy link

Jonas Birmé, Lorenzo and I will put a PR together on this and get back to you soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants