Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why expose produceCropTarget at MediaDevices level? #11

Closed
youennf opened this issue Jan 25, 2022 · 98 comments
Closed

Why expose produceCropTarget at MediaDevices level? #11

youennf opened this issue Jan 25, 2022 · 98 comments

Comments

@youennf
Copy link
Contributor

youennf commented Jan 25, 2022

Looking at the algorithm, it seems the same CropTarget object will be created if produceCropTarget is called several times on the same element.
That would probably be specified by adding a CropTarget slot on HTMLElement directly.
This would also solve probably the cloning element cropTarget issue.

The second question is whether below API might not be better suited:

partial interface HTMLElement {
  readonly attribute Promise<CropTarget> cropTarget;
}

I am also wondering why we even need a promise there.
It seems implementations could produce a cropTarget synchronously without any IPC, which would end up with:

partial interface HTMLElement {
  readonly attribute CropTarget cropTarget;
}
@eladalon1983
Copy link
Member

eladalon1983 commented Jan 25, 2022

That would probably be specified by adding a CropTarget slot on HTMLElement directly.

Exposing the interface on HTMLElement is possible, but I think exposing on something more media-related or RTC-related gives better encapsulation. The vast majority of HTMLElements do not end up serving as a crop-target. It's best of both the controls as well as their documentation are more localized to where they're relevant.

This would also solve probably the cloning element cropTarget issue.

I don't think so, but possibly I am missing your point.
As I see it, there are two "cloning issues" at play:

  1. Cloning of the target. A clone of an HTMLElement is a distinct HTMLElement, and should have a new CropTarget. This is equally easy to express with either approach (MediaSession.produceCropTarget vs. HTMLElement.cropTarget). In terms of communicating the selected approach, I think it's neater to express with produceCropTarget than with the alternative, but possibly this is subjective.
  2. Cloning of the track. I believe this is orthogonal.

I am also wondering why we even need a promise there.

The intention is:

  1. Allow produceCropTarget() to execute quickly for applications that don't immediately need the result. (E.g. cache it for later, in case gDM is called.) Allow JS execution to proceed immediately without blocking on IPC message-and-response.
  2. Allow implementations flexibility (explained below).

I'm using the working assumption that we agree on no1 (let's discuss oherwise) and moving to discuss no2. Consider the example of Chrome. Returning a Promise gives Chrome the flexibility to produce the CropTargets' underlying ID in what we call the "browser process". This means that when cropTo(x) is called, we can validate x in the (trusted) browser process, check that it's a valid CropTarget (always necessary), and that the caller is allowed to cropTo it (relevant while we limit the spec to self-capture; orthogonal issue). JS execution proceeds immediately after produceCropTarget(), but an IPC message is sent to the browser process, and the Promise is resolved when the response IPC comes from the browser process.

Using a Promise also makes it easy to convince ourselves that Chrome's implementation suffers no hidden race conditions. Consider if a Promise were not employed in this scenario:

  • Cross-origin documents D1 and D2 cohabitate in a tab. Each documents has its own associated process.
  • D1 produces a CropTarget. D1 immediately gets a token before any IPC happens, but some IPC would have to complete before all relevant processes hear of this CropTarget.
  • (For the sake of argument, let's assume that CropTarget is both transferable as well as serializable.)
  • D1 sends the CropTarget to D2. Maybe using postMessage, maybe over the network.
  • D2 tries to use CropTarget.

In this scenario, it is required that the CropTarget D1 minted would be known to the cropTo() code-paths before D2 tries to call cropTo(), or that the cropTo() code-paths be robust to such reordering. This can be hard to reason about. But if we return a Promise, then we get no1 immediately, and we can resolve the Promise at our leisure, when it's patently obvious that the CropTarget has been fully propagated.

Note that the Promise gives flexibility. If Safari's implementation can convince itself there is no race even if the CropTarget is produced immediately, then Safari can return an already-resolved Promise.

Editorial: To make this thread more accessible, could you please edit the original post and add ```webidl before the code snippets before and ``` after them?

@youennf
Copy link
Contributor Author

youennf commented Jan 26, 2022

exposing on something more media-related or RTC-related gives better encapsulation.

Partial interfaces seem sufficient to me.
Tieing produceCropTarget to MediaDevices creates some edge cases we should not have to deal with.
For instance, what happens in the case of calling produceCropTarget on a detached iframe for a HTMLElement which is not part of the detached iframe?

Calling twice produceCropTarget on the same element is expected to produce the same CropTarget, which might be surprising given it is a method.
On the other hand, this is by definition what is expected of attributes so it seems best to model this API as an attribute (basically as a slot in HTMLElement).

As of promise attribute vs. sync attribute, from what I read, using promises here is to accommodate a particular browser implementation. In general, we reserve promises to asynchronous operations, like querying hardware, querying centralised resources, querying user input. This is clearly not the case here, CropTarget is nothing more than a unique ID.
It does not seem hard to make it work synchronously without any race condition.

Editorial: To make this thread more accessible, could you please edit the original post and add webidl before the code snippets before and after them?

Done

@eladalon1983
Copy link
Member

How common is it for specs to plug additional attributes directly onto HTMLElement? Checking the Chromium codebase just because that's the quickest way for me, it seems like it's exceedingly rare. I expect that the Chromium implementation is a good approximation of what W3C specifications do, but if my hack at getting this information was misguided, please let me know.

As for extending HTMLElement, that's of course quite common - but that's not what you're proposing. :-)

Btw, is it actually possible to expose a new field on all HTMLElements without incurring any cost? I'm not sufficiently familiar with how this is implemented in Chrome, let alone in other browsers, but I'd expect a non-zero cost for this exposure even if the field is unset until it's first read. (Possibly as small as increasing a lookup table, which might only happen when the Nth new field is added. I am not sure. But probably it's not zero-cost...?)

For instance, what happens in the case of calling produceCropTarget on a detached iframe for a HTMLElement which is not part of the detached iframe?

Could you please clarify how this edge-case would be problematic? The way I see it, if a given context can get a reference to the object in order to supply it as a parameter to produceCropTarget(), then that's all that matters.

Calling twice produceCropTarget on the same element is expected to produce the same CropTarget, which might be surprising given it is a method.

I don't think that's surprising. We can also reduce the (IMHO already low) surprise factor by renaming to getAsCropTarget or something similar. But I think "produce" is good enough.

using promises here is to accommodate a particular browser implementation

Politely disagree. It's done to afford all implementations the flexibility to implement efficiently and simply. If some implementations don't require this flexibility, I think it wouldn't harm them either - simply return the Promise pre-resolved.

@youennf
Copy link
Contributor Author

youennf commented Jan 27, 2022

How common is it for specs to plug additional attributes directly onto HTMLElement?

I don't really know, it probably depends how much HTMLElement is actually used in APIs.
Infrastructure and implementations have prime support for that and some limited specs do that.

is it actually possible to expose a new field on all HTMLElements without incurring any cost?

I am not exactly sure which cost you are talking about. Is it that HTMLElement would get bigger in memory?
If so, implementations have the flexibility to either use a new member in HTMLElement, a separate map<element, region> or any other strategy.

Could you please clarify how this edge-case would be problematic? The way I see it, if a given context can get a reference to the object in order to supply it as a parameter to produceCropTarget(), then that's all that matters.

It would need to be tested but if frame is detached, chances are the promise will reject.
It does not make real sense to tie HTMLElement with an unrelated construct.
For instance requestFullScreen is at Element level, which begs the additional question whether CropTarget should be Element or HTMLElement. Element might make more sense.
I'll file a separate issue.

It's done to afford all implementations the flexibility to implement efficiently and simply.

There is a tradeoff with promises, we should use them when/if they are needed, which does not seem to apply here.
Taking your previous example about potential race conditions, let's say we have A the capturee, B the capturer, C the entity doing the actual cropping, all living in 3 processes.
Your plan seem to be:

  1. A->C->A to register a surface identifier from a HTMLElement identifier (asynchronous)
  2. A creates a region from surface identifier
  3. A->B to send the capture region
  4. B->C to ask for cropping (asynchronsous)
  5. C->B to say cropping is enabled.

Step 1 & 2 can be replaced by:

  1. A creates a region from HTMLElement identifier (synchronous)
    Then 5 will be replaced by
    5a. C->A->C to identify surface identifier from HTMLELement identifier (B request identifies unambiguously A's process).
    5b. C->B to say cropping is ongoing
    In practice, C->A->C can be optimised to not be needed.

@eladalon1983
Copy link
Member

There is a tradeoff with promises, we should use them when/if they are needed, which does not seem to apply here.

It makes it much easier for Chrome to deliver a more efficient implementation, without costing anything for other browser vendors (return a pre-resolved Promise) or Web developers (await cropTo). The tradeoff seems favorable to me.

let's say we have A the capturee, B the capturer, C the entity doing the actual cropping

Who is C in this scenario? If A in the document that holds the crop-target (div, iframe, etc.), then it will produce the CropTarget (the token) and send it to document B, who holds the track. B will call cropTo directly. I don't follow where C fits into the picture, except possibly as a conduit for messages which can be neglected¹. What am I missing?

--
[1] I'm using "neglect" here as one does in physics, e.g. when neglecting air resistance or friction.

@youennf
Copy link
Contributor Author

youennf commented Jan 27, 2022

without costing anything for other browser

A promise has a cost for web developers and for devices executing them.
We should not overpromisify APIs.

Who is C in this scenario?

C is the process producing the screen frames and is the one that needs to identify the actual display layer the element refers to. My understanding is that this process is separated from A and B.

B will call cropTo directly.

cropTo is already an asynchronous operation and can handle the resolution of CropTarget.

@eladalon1983
Copy link
Member

(I am sure Youenn knows, but if others want to join the conversation, one line of background: In Chrome, applications run in a "render process", and there is a "browser process" coordinating them. Cross-origin applications run in different processes.)

Back to the produceCropTarget Promise.
I think the implementation in Chrome, and its constraints, might have analogues in other browsers. Let's examine it.

  • A CropTarget is produced in one document (D1), then used by another document (D2) in the same browsing context.
  • There could be many more cross-origin documents in the browsing context, and many processes.
  • When D2 calls cropTo(X), the UA has to validate that X is a valid CropTarget.
  • It is undesirable to query all documents and check if any of them have produced X. Rather, the browser process holds a central repository of valid CropTargets and their associated browsing context. When cropTo() is called, this repository is consulted.
    • If the CropTarget includes the origin, this can be simplified, but it still requires an RTT of IPCs, D1->BROWSER->D2->BROWSER->D1. An insufficient improvement.
  • That means that produceCropTarget inserts into that BP-based repository.
  • That also means that, if we don't impose a cap on number of CropTargets, an application can directly affect the memory consumption of the browser process. That is undesirable.
  • So there is a cap on the number of CropTargets a browsing context could have.
  • That means that when attempting to produce a CropTarget, there has to be IPC with the browser process before we know that we're in the happy case, and a CropTarget can be produced.
    • Side-note: For security's sake, it's also more obvious that things are safe, if the underlying token for the CropTarget, is assigned by the trusted browser process, rather than communicated to it from the render process. For example, it becomes trivial to guarantee use a low-bit-count token behind the scenes, guarantee no collisions, and thus have a simple and efficient implementation in the rendering pipeline.
  • That means that produceCropTarget has to either return a Promise (which is resolved after IPC with the browser process), or it has to block on this IPC.
  • Blocking JS applications on IPC is something Chrome aims to avoid.

As mentioned, Chrome's implementation problems are Chrome's problems, but (a) I suspect other implementers will have similar issues and (b) the cost of resolving these issues is negligible.
Returning a Promise gives implementers flexibility.

@youennf
Copy link
Contributor Author

youennf commented Jan 27, 2022

  • That means that when attempting to produce a CropTarget, there has to be IPC with the browser process

That is where we differ in the analysis.

  • Rather, the browser process holds a central repository of valid CropTargets and their associated browsing context.

The central repository is an implementation choice that is a potential source of issues (memory for instance as you pointed out), it seems it should be avoided if possible.
If not possible, the spec should be clear about this. Maybe it should describe the central repository and how it is supposed to be used. WebLock is a good example of that and using promises there makes sense.

In our particular case, the spec does not define any such central repository.
This central repository can be avoided by delaying the identification of the display surface to cropTo time.
For that purpose, CropTarget needs to contain enough information, like uniquely identifying D1 (maybe using https://html.spec.whatwg.org/multipage/webappapis.html#concept-environment-id).

D2 will transmit this information to browser process at the time of cropping.
Browser process may want to validate the cropping information by matching it with what it knows from the MediaStreamTrack source (like the list of the documents the source is capturing).
Browser process might instead (or in addition to the previous validation) want to further validate or get additional information directly from D1's process, using asynchronous IPC. This is fine since this is part of cropTo which is an asynchronous operation.

The spec is light in how/when cropTo is supposed to validate crop targets, I'll file a separate issue.
Also this issue was mostly about MediaDevices or Element and it is heavily going in Promise or not.
Maybe we should have a specific issue for discussing whether promising or not.

@eladalon1983
Copy link
Member

A central repository is an implementation detail. The spec should not accidentally disallow reasonable implementations (hence the Promise). The spec is not concerned with such details beyond that point.

It's undesirable to put in the token more information than strictly necessary. By ensuring that CropTargets do not encode information about their source, we can provide better guarantees about the safety of the entire mechanism. (Please note - information not exposed to JS, but potentially available to a malicious application which has taken over the process, is also undesirable. Information not encoded in the render process cannot leaked to a malicious application.)

Also this issue was mostly about MediaDevices or Element and it is heavily going in Promise or not.
Maybe we should have a specific issue for discussing whether promising or not.

If my current comment has not been sufficient to convince you, you're welcome to continue the discussion in a spin-off issue.


I wonder if anyone else (@jan-ivar, @aboba, @alvestrand) has an opinion to voice about exposing as MediaDevices.produceCropTarget() vs. exposing as Element.cropTarget().

@youennf
Copy link
Contributor Author

youennf commented Feb 2, 2022

you're welcome to continue the discussion in a spin-off issue.

I have done that in #17.
My understanding is that the issues you are mentioning have been solved for other objects but there may well be something specific to CropTarget, let's find out in the new issue.

@eladalon1983 eladalon1983 changed the title Why exposing produceCropTarget at MediaDevices level? Why expose produceCropTarget at MediaDevices level? Feb 2, 2022
@eladalon1983
Copy link
Member

Thanks for spinning off.

Would the following be a good summary of the discussion that is remaining in this thread?

  • @eladalon1983 prefers exposing MediaDevices.produceCropTarget(Element), citing:
    • Better encapsulation, as CropTarget production is media/RTC-related, as is MediaDevices; Element is very general.
    • In contexts where MediaDevices is missing (eg HTTP), this API would not be exposed, which is preferable.
    • If we agree (other thread) that we need a Promise, then a method is more natural than an attribute.
  • @youennf prefers exposing Element.cropTarget().
    • (I don't want to write "citing X" as I do not feel I can do justice to your position this time. Please help me out here.)

@youennf
Copy link
Contributor Author

youennf commented Feb 2, 2022

  • Better encapsulation, as CropTarget production is media/RTC-related, as is MediaDevices; Element is very general.

requestFullScreen API attached to Element directly is well encapsulated through partial interface.

  • In contexts where MediaDevices is missing (eg HTTP), this API would not be exposed, which is preferable.

SecureContext can be applied to methods as well as interfaces so this seems orthogonal.
With mixed context allowed, one could think of having a non secure context generate CropTarget for its SecureContext parent.

  • If we agree (other thread) that we need a Promise, then a method is more natural than an attribute.

Using promises is largely orthogonal in that case.
The question is more whether for a given element, you get the same CropTarget object (current spec behavior) or not.
With attributes, you get the same CropTarget calling get after get, this is an expected behaviour.
With methods, you would get different promises but the same fulfilled JS object.
Existing APIs that have this model tend to my knowledge to prefer attributes in that case (see FetchEvent, WHATWG Streams, WebAnimation as example).

@eladalon1983
Copy link
Member

Unless we've missed a good argument in either direction, it seems to me like the decision boils down to a matter of preference.

  1. Wdyt of this characterization?
  2. If that is correct, how do you suggest we resolve? Shall we ask the rest of the WG to voice their own preference?

@youennf
Copy link
Contributor Author

youennf commented Feb 2, 2022

  1. Wdyt of this characterization?

I don't think it is solely a matter of preference.
Using an attribute seems more consistent with existing web APIs, at least that is my current evaluation.
We should try to reach consensus on validating/invalidating this evaluation.
Also, using mediaDevices exhibits some suboptimal behaviour in at least one edge case (the detached iframe case as discussed before).

  1. If that is correct, how do you suggest we resolve? Shall we ask the rest of the WG to voice their own preference?

Getting WG input is always useful, too late for next meeting though probably.

@jan-ivar
Copy link
Member

jan-ivar commented Feb 3, 2022

@youennf prefers exposing Element.cropTarget().

@eladalon1983 AFAICT Youenn prefers Element.cropTarget, not Element.cropTarget().

I think he makes a good case for why an attribute is most idiomatic given the properties we want. However, there are a LOT of Elements compared to the rarity of needing a cropTarget from one — 0.005% of pageloads use getDisplayMedia and only a fraction of those will likely be interested cropping. So this gives me pause.

Presumably, the cropTarget would be instantiated in the getter if it doesn't exist, rather than at Element construction time, but this getter might be implicitly called by code libraries or even browser tools interrogating the Element (e.g. dumped to web console).

That said, there's precedent for rarely used features in Element.requestFullscreen() which is a method, so maybe Element.cropTarget() would be the right tradeoff?

cc @annevk Thoughts on this?

@youennf
Copy link
Contributor Author

youennf commented Feb 4, 2022

That said, there's precedent for rarely used features in Element.requestFullscreen() which is a method

Element.requestFullScreen() is rightfully a method, it returns a different promise every time it is called.
Or are you thinking of some different APIs?

FWIW, a cropTarget when created should be nothing more than a weak reference to its element (when being transferred or when used in cropTo is when additional steps might be executed). Hence why I do not see accidental code libraries CropTarget creation as an issue.

@eladalon1983
Copy link
Member

FWIW, a cropTarget when created should be nothing more than a weak reference to its element (when being transferred or when used in cropTo is when additional steps might be executed). Hence why I do not see accidental code libraries CropTarget creation as an issue.

Triggering accidental lazy creation of CropTarget-s for all Elements in the DOM is patently undesirable. First, for the memory it consumes; that much should be self-evident. Second, for the side-effect; elaboration follows.

Let's define as "active" a CropTarget which is being used - that is, there is some track which was cropTo()-ed with that CropTarget. An "active" CropTarget has to be tracked along the rendering pipeline in every frame, which requires some work from the UA. Unless cropTo(X) has been called on some X, it is not strictly necessary to perform this work of tracking X's location in every frame. But avoiding the work until cropTo() is called, means a longer delay until the first cropped frame is delivered (and hence the cropTo Promise resolved). For that reason, it is reasonable (but not required) to implement CropTarget as immediately "active" - as soon as it's produced, before cropTo is called. This makes cropTo resolve its Promise faster, but it also means that minting the token has a non-zero cost. Even if we expose on Element, we should expose as a method, so as to discourage unintended production of a CropTarget for every single Element.

@youennf
Copy link
Contributor Author

youennf commented Feb 8, 2022

It would be interesting to understand whether libraries actually go through all elements and get all their attributes.
I suspect that this would be a perf issue w/o CropTarget.

Thinking a bit more though, I don't see why the spec mandates to expose a single CropTarget per Element.
It makes implementations a bit harder without any real benefit. Maybe this is a left over from the CropID initial version?
Removing this constraint from the spec and making cropTarget a method (probably renaming it to getXYZ as well) seems like a good idea to me.

@eladalon1983
Copy link
Member

Thinking a bit more though, I don't see why the spec mandates to expose a single CropTarget per Element. It makes implementations a bit harder without any real benefit. Maybe this is a left over from the CropID initial version? Removing this constraint from the spec and making cropTarget a method (probably renaming it to getXYZ as well) seems like a good idea to me.

Could you please explain why this is a problem for implementers?

@annevk
Copy link
Member

annevk commented Mar 11, 2022

Generally the "iterate over all members" pattern affects Window, Navigator, and Document. I wouldn't expect it to matter for Element. (Note that the syntax you all are using is rather confusing as it seems you are talking about an instance getter/method whereas it very much appears like a static getter/method.)

@alvestrand
Copy link

I don't see a big advantage in exposing produceCropTarget() on a plethora of objects. Having the function be in one place makes much more sense to my impression of understandability.

@eladalon1983
Copy link
Member

eladalon1983 commented Mar 18, 2022

Thinking a bit more though, I don't see why the spec mandates to expose a single CropTarget per Element. It makes implementations a bit harder without any real benefit. Maybe this is a left over from the CropID initial version? Removing this constraint from the spec and making cropTarget a method (probably renaming it to getXYZ as well) seems like a good idea to me.

Could you please explain why this is a problem for implementers?

To document some out-of-band discussions, I am OK with the spec mandating returning an equivalent CropTarget, which is a less stringent requirement than "the same" CropTarget.

@youennf
Copy link
Contributor Author

youennf commented Mar 21, 2022

I don't see a big advantage in exposing produceCropTarget() on a plethora of objects.

It is not on a plethora of objects, it is only either in MediaDevices prototype or HTMLElement prototype.

We discussed with @eladalon1983 some of the reasons why element was a better location during last editor's meeting.
At that time, I felt there was some consensus towards element.
Some reasons below:

  • MediaDevices is SecureContext, Element is not. It seems ok for a non secure document to be able to create a CropTarget. This might be handy in the future or in mixed contexts today. A context that does not need MediaDevices might still find it useful to create and transfer CropTargets.
  • MediaDevices is tied to its navigator hence to a specific Document. Elements can be transferred from one document to another. If we were to use MediaDevices, we would need to handle the case of creating CropTargets for elements which are not tied to the same document as the MediaDevices instances. There does not seem to be any reason to try going in those edge cases, that can get even weirder if using promises.
  • The MediaDevices instance is not bringing anything to the creation of CropTarget, contrary to getUserMedia et al. A static MediaDevices.produceCropTarget would work equally well. Why trying to tie the algorithm to an object that is unused. In programming language, we usually try to remove unused parameters.
  • Feature detection is more easy if tied to the element than with MediaDevices: 'cropTarget' in Element.prototype in one case. With MediaDevices, you would need to actually call the produceCropTarget API to get the same result.
  • There is no real difference between the two versions in terms of documentation/implementation, the separation of concerns is easy to do in both cases. Given we have partial interfaces, we would have Element+CropTarget.idl for instance.

@eladalon1983
Copy link
Member

It is not on a plethora of objects, it is only either in MediaDevices prototype or HTMLElement prototype.

I think what @alvestrand was referring to, is that (i) this would now be exposed on many different sub-types of HTMLElement, as well (ii) exposed on multiple instances. If that's what he meant, I agree.

MediaDevices is SecureContext, Element is not. It seems ok for a non secure document to be able to create a CropTarget. This might be handy in the future or in mixed contexts today. A context that does not need MediaDevices might still find it useful to create and transfer CropTargets.

That was an interesting case, but unclear to me how important it is to keep supporting non-secure contexts. Would they be able to postMessage the CropTarget, for instance? Would it be possible to trust the message in which they do so?

Elements can be transferred from one document to another.

What's the mechanism for that?

If we were to use MediaDevices, we would need to handle the case of creating CropTargets for elements which are not tied to the same document as the MediaDevices instances.

I don't think there has to be a connection between where the CropTarget-minting-API is exposed, to where the elements are, let alone where the CropTargets are.

A static MediaDevices.produceCropTarget would work equally well.

Could you clarify what you mean here? I couldn't parse this in a manner consistent with the rest of the message. I mean, if there is a way to define produceCropTarget as exposed on the class MediaDevices, but not tied to any object, then that's fine by me...

With MediaDevices, you would need to actually call the produceCropTarget API to get the same result.

Why would I need to call the function? Why can't I just check for its existence using !!navigator.mediaDevices.produceCropId?

@lghall
Copy link

lghall commented Mar 23, 2022

I represent a team at Google that has been using the Region Capture API internally. We prefer mediaDevices.produceCropId (vs Element.produceCropId()) because it's easier to feature-detect whether the API is available without needing an Element to detect the method presence. Hope that helps!

@youennf
Copy link
Contributor Author

youennf commented May 16, 2022

With regards to point 1::

  • Fullscreen API is not following this pattern and I do not see any confusion there.
  • The documentation issue has been solved with partial interfaces/different specs.
  • There are other Media related locations, why is mediaDevices the right place? For instance a static MediaStreamTrack.produceCropTarget would allow to put consumer/generator API in the same place, and avoid some corner cases related to the mediaDevices object.

@alvestrand
Copy link

alvestrand commented May 16, 2022

What's a "neutered mediaDevicees"?

Since CropTarget is (by design) useless without a captured MediaStreamTrack, and getDisplayMedia() is a mediaDevices function, the only case where this matters is when mediaDevices is enabled in the capturer but disabled (how?) in the capturee.

@youennf
Copy link
Contributor Author

youennf commented May 16, 2022

What's a "neutered mediaDevicees"?

https://jsfiddle.net/8a0qsf35/ is a good example (detached iframe).

when mediaDevices is enabled in the capturer but disabled (how?) in the capturee.

SecureContext is one example where MediaDevices might not be available in the capturee.

These two examples show that putting this API in MediaDevices interface or mediaDevices instance creates unnecessary issues. The underlying issue is that this API shape ties Elements with unrelated MediaDevices objects.
We should do better than that.

@jan-ivar
Copy link
Member

jan-ivar commented May 17, 2022

what cost do users and developer incur, if the API is made asynchronous?

@eladalon1983 I already answered that above in #11 (comment) and #11 (comment).

@eladalon1983
Copy link
Member

what cost do users and developer incur, if the API is made asynchronous?

@eladalon1983 I already answered that above in #11 (comment) and #11 (comment).

Those comments criticize the current API shape from multiple directions, but do not explain how accommodating implementers here, would come at the expense of higher constituencies. Unless you mean to say that developer confusion is your claim of harm to higher constituencies? Is that your argument? If so - is it your only argument wrt priority of constituencies, or are there more?

@youennf
Copy link
Contributor Author

youennf commented May 17, 2022

This thread is about where to put the API, be it sync or async, let's focus on that.
The sync/async discussion should be discussed in #17.

@eladalon1983
Copy link
Member

eladalon1983 commented May 17, 2022

This thread is about where to put the API, be it sync or async, let's focus on that. The sync/async discussion should be discussed in w3c/mediacapture-screen-share#17.

I am OK migrating the discussion there. But I still want it continued. What can be brought up four times, can be explained once.

@youennf
Copy link
Contributor Author

youennf commented May 18, 2022

If exposing to Element is not good for documentation, and given exposing it to MediaDevices has shortcomings, the best approach might to add the API directly within CropTarget:

  • sync version: new CropTarget(element)
  • async version: CropTarget.fromElement(element)

@eladalon1983
Copy link
Member

eladalon1983 commented May 18, 2022

async version: CropTarget.fromElement(element)

This is aesthetically pleasing. Let me check one of my concerns first, right after I finish something unrelated, and I'll be back to discuss this.

But to save us an iteration - I'll probably want this to fail in insecure contexts. Your thoughts?

@beaufortfrancois
Copy link
Contributor

beaufortfrancois commented May 18, 2022

If I understand correctly, we would get something like this: A static method from an interface that would return the interface.

[
  SecureContext,
  Exposed=(Window,Worker),
  Serializable
] interface CropTarget {
  static Promise<CropTarget> fromElement(Element element);
};

This would be the first time we introduce this pattern to the Web Platform.
And it feels weird from a web developer perspective.

@domenic
Copy link

domenic commented May 18, 2022

This would be the first time we introduce this pattern to the Web Platform.

I don't think that's accurate. We just specced Response.json(); there are tons of such factory methods in CSS Typed OM (e.g. CSSStyleValue.parse()) and the Geometry spec (e.g. DOMMatrix.fromFloat32Array()), ... This is a pretty standard pattern by now, and a good one IMO.

@beaufortfrancois
Copy link
Contributor

You're right @domenic. I was looking for static Promise<Foo> bar(), not static Foo bar(). This pattern exists indeed. Sorry for the noise.

@eladalon1983
Copy link
Member

Shall we go with @youennf's suggestion of async CropTarget.fromElement() then? I can send a PR.

@youennf
Copy link
Contributor Author

youennf commented May 19, 2022

@jan-ivar , wdyt?

@youennf
Copy link
Contributor Author

youennf commented May 19, 2022

FWIW, I am ok compromising on regrouping API in CropTarget as I proposed earlier, as long as we can revisit the exact API shape once the sync/async issue is resolved.

@jan-ivar
Copy link
Member

@youennf getting the irrelevant MediaDevices object out of the picture is a win. Thanks!

@eladalon1983
Copy link
Member

eladalon1983 commented May 19, 2022

@youennf getting the irrelevant MediaDevices object out of the picture is a win. Thanks!

@jan-ivar, does that mean that if I send a PR to move the point of exposure to CropTarget.fromElement(Element), you'll approve it? Note that I'll keep it async, which is the status quo, and maintain the same note about it being a contested issue.

@jan-ivar
Copy link
Member

That seems like a good way to separate these issues.

@youennf
Copy link
Contributor Author

youennf commented May 20, 2022

[
  SecureContext,
  Exposed=(Window,Worker),
  Serializable
] interface CropTarget {
  static Promise<CropTarget> fromElement(Element element);
};

CropTarget should probably not be SecureContext, nothing prevents CropTarget to be serialized from a secure context to a non secure context AFAIK.
The debate is more to decide whether we want fromElement to be gated by SecureContext.
The use case is whether a secure capturer would potentially want to crop a non secure capturee if/when cross-page cropping is supported.
I am curious what the rationale is for marking it as SecureContext (other than the mantra of new API = SecureContext).

@eladalon1983
Copy link
Member

eladalon1983 commented May 20, 2022

The status quo of the document is to only expose token-minting in secure contexts. This follows directly from the minting-API being exposed on navigator.mediaDevices. I therefore propose that we should concentrate on transitioning to CropTarget.fromElement for now, which would keep SecureContext. We can file a separate issue to debate that issue. (Full disclosure - I am going to strongly oppose removing this.)

@eladalon1983
Copy link
Member

eladalon1983 commented May 20, 2022

I'm going to send a PR with this shape:

[Exposed=(Window,Worker), Serializable]
interface CropTarget {
  [SecureContext] static Promise<CropTarget> fromElement(Element element);
};

This ensures the delta from the current spec is purely in the point of exposure (from MediaDevices objects to a static method of CropTarget.)

@eladalon1983
Copy link
Member

eladalon1983 commented May 20, 2022

I've pushed PR w3c/mediacapture-screen-share#50 for review. It changes as little as possible from the status quo; literally a copy-paste with s/produceCropTarget/fromElement executed on it. I suggest we land that, close this issue, and proceed with matters of polish (the prose can be improved) and substance (e.g. w3c/mediacapture-screen-share#17) as a follow-up. PTAL.

@alvestrand
Copy link

I agree with the proposed solution (minimal change).
Relaxing restrictions is always backwards compatible; tightening them is not.

@jan-ivar
Copy link
Member

@youennf getting the irrelevant MediaDevices object out of the picture is a win. Thanks!

@jan-ivar, does that mean that if I send a PR to move the point of exposure to CropTarget.fromElement(Element), you'll approve it? Note that I'll keep it async, which is the status quo, and maintain the same note about it being a contested issue.

That seems like a good way to separate these issues.

@eladalon1983 the way I interpreted the compromise above was we agreed to move the point of exposure from MediaDevices (a SecureContext object) to CropTarget (which is not). Adding SecureContext to fromElement would appear to not honor that.

The status quo of the document is to only expose token-minting in secure contexts.

@youennf already mentioned in #11 (comment) that this was part of the issue: "MediaDevices is SecureContext, Element is not. It seems ok for a non secure document to be able to create a CropTarget", so these were known stakes.

Relaxing restrictions is always backwards compatible; tightening them is not.

@alvestrand In the interest of progress, I'll accept the PR with a note on the lack of consensus over SecureContext.

@eladalon1983
Copy link
Member

@eladalon1983 the way I interpreted the compromise above was we agreed to move the point of exposure from MediaDevices (a SecureContext object) to CropTarget (which is not). Adding SecureContext to fromElement would appear to not honor that.

Then we have different interpretations. I still suggest we start by merging w3c/mediacapture-screen-share#50, then continue the discussion about SecureContext separately. Should we decide to move away from SecureContext, we'll then have the path for it (which we didn't with mediaDevices.produceCropTarget).

@youennf already mentioned in #11 (comment) that this was part of the issue

I understand Youenn's reasoning and I have addressed it.

I'll accept the PR with a note on the lack of consensus over SecureContext.

That's great.

@youennf
Copy link
Contributor Author

youennf commented Jun 14, 2022

Closing this issue now that the API moved to CropTarget.
I'll file an issue to keep track of SecureContext or non SecureContext.

@eladalon1983
Copy link
Member

Closing this issue now that the API moved to CropTarget.

Closing.

I'll file an issue to keep track of SecureContext or non SecureContext.

For posterity - w3c/mediacapture-screen-share#69.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants