-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explainers should lead with sample code, should not include IDL #14
Comments
Thanks! Pitor is out for a couple of weeks; so I'm posting planned explainer changes here as well to help facilitate this conversation while I work on the PR/figure out if there's anything I need to do to merge them while he's out.
Edit: The PR below should have addressed this.
Easy enough to remove; I feel like in the past this had been a required portion of explainers; but maybe just for early explainers?
Edit: The PR below should've addressed this.
This is not being proposed as an extension to getUserMedia because it is actually a goal of the spec to deliver the camera image alongside the rest of the WebXR frame data. Going through getUserMedia() would not allow this synchronization.
I'm not sure I understand the question here? The API shape directly exposes the camera as a WebGL texture?
Color space is an unresolved issue for WebXR overall, see for example immersive-web/webxr#1140 and immersive-web/webxr#988
I’ve touched on this a bit in my proposed language for the new non-goals section, but these are out of scope for this particular API. Future alternative APIs may support multiple cameras that aren't aligned with a single view, but this requires applications to handle asynchronous poses and has different privacy concerns, for example the user would need to be aware of the possibility that cameras may be capturing parts of the environment that are not visible in the XR view.
WebXR immersive sessions can use a XRWebGLRenderingContext created from an OffscreenCanvas, see for example this sample and its source. In that case, both the opaque framebuffer used for output and the raw camera image GL texture would correspond to the OffscreenCanvas-based XRWebGLRenderingContext.
I think a lot of the alternatives mentioned fall into some non-goals that will now explicitly be called out (mainly that this API is explicitly targeting smartphone AR which has a 1:1 Frame:Camera relationship, and that a broader API is needed for Headsets). Please see the discussion at the end of the explainer for more context. Even with plans for this more general API, there’s still developer interest in a simplified API for this specific 1:1 synchronized use case.
These two are simple enough edits to make, though not something I've seen in other specs/explainers in the past. Edit: There didn't appear to be a clear good point to add the link back to the explainer; I did what I thought was likely the best, but if you have a preferred mechanism, please let me know. |
Hey Alex, Thanks for the updates to the explainer and the responses here. So I'm pretty comfortable with how XR rendering works as I helped review the original WebVR APIs while serving on the TAG. With that background, I am a bit surprised that the Explainer doesn't do a fuller job of explaining why we don't use At an architectural level, those solutions should have a uniform API, if not an identical mechanism. E.g., do we need to design a new delegation mechanism for permission requests here? Are these cameras iterable via My question about the GL texture was badly stated but boils down to what the internal format is. Is it "raw" (something new)? Is it All of these questions suggest that we want to keep types, API surface, and other attributes in close alignment now and in the future, so a good design goal will be to at least outline how that would be done. So a fuller discussion of considered alternatives would be expected to speak to this, and I'm interested in seeing this doc grow in that way. On moving IDL out of explainers, this comes back to long-standing guidance, both from my time as Chrome's Standards TL and the TAG's Explainer Explainer. The audience for an explainer is, primarily, a developer that's trying to think about how this design can work in their system, rather than how the design will be expressed in our system. Crosslinks between documents help readers find the details that don't naturally belong in one document or the other. For example, IDL should not be in explainers, and non-normative discussion can often feel out of place in a spec. |
I’m not sure I understand the question/ask here. The primary reason is that we want the camera frame to be directly associated and synchronized with the relevant WebXR data. Exposing this through getUserMedia would require massive changes (likely exposing XR data through the getUserMedia streams, and be a huge developer burden) and not guarantee this synchronization; which I feel the explainer does a pretty good job of describing, albeit not in as explicit detail, and I’m not sure I understand how the approaches you suggest help with that guarantee. We don’t really care about camera toggling, as we can’t change the camera during an AR session; and color space information is an open question for WebXR that we don’t want to expose yet. Adding a simple property to the existing WebXR data is more ergonomic for developers as well.
I wouldn’t say we have a new delegation mechanism for permission requests. We check the same camera permission and prompt the same way as any other request to use the camera. The AR Camera is not iterable via enumerateDevices(); while I believe it may be technically possible to do so, the AR Session would have to be started in that mode, and will have decreased performance. Further, WebXR crops the camera image to match the screen aspect ratio, and we ensure that the delivered frame from this API is cropped the same way. This is not something we can do via gUM. It would be a privacy concern if the application receives a larger field of view from the camera than what the user sees on their screen.
The "raw" naming isn't related to"raw images" as used in DSLR photography. In this context it refers to the distinction between "raw camera access" in the sense of "here's the camera image as a pixel grid" versus indirectly providing higher-level abstractions such as poses, planes, or hit test results which are the result of extensive processing done internally on the source camera data. There's nothing special about the image texture, it's equivalent to the kind of data you'd get by taking a snapshot of a video stream. Maybe "direct camera access" would have been a better name in hindsight to avoid confusion with other concepts such as getting unprocessed sensor data.
I’m not sure I agree. I think this API, and the one that would solve for/allow properly exposing Headset cameras or working for devices with multiple cameras solves two different problems and will likely be a vastly different API. Part of why we’re trying to ship this API separately now is to address the hand-held AR scenarios without needing to solve all of the open questions for the other space. I think a pointer to the relevant essentially open issue for the alternatives, is the best way to handle this since that discussion is ongoing. I think this goal is now clear in the explainer given the new non-goals section.
Ack, I think I may be confused with the early stage of an explainer that likely includes this proposed IDL before a formal spec is written.
Sure, definitely understand that. I just hadn’t seen (and couldn’t find) any other examples of this; so let me know if you think I did the cross-linking in a reasonable way, or if there was established guidance on what to do here. |
Sorry for the slow follow-up here, @alcooper91. Let me back up a bit. We should expect AR devices to be able to be all sorts of things - cameras, displays, sensor platforms for other types of input, etc. - which means that developers addressing the same hardware through different API surfaces is more than likely. That argues, at a minimum, for APIs that are designed for long-term future alignment between the ways in which we configure, enumerate, and trigger capabilities about those sensors. I'm a little frustrated that the TAG didn't raise this sort of consistency in their review (perhaps @torgo can speak to the discussion there?), but it's the kind of thing we've hit in many other areas of platform development over the years and now understand needs to be handled affirmatively rather than being left as an exercise to further API evolution. To that end, I need to see this explainer improve in a few dimensions:
This might be the ideal design, but from what I've read here, it's hard to make a case either way. Hopefully a fuller explainer will quash doubts. |
Thanks Alex. While I agree that there may be many sensors that make up an AR session, I think that doesn’t preclude the ability of an AR Session to “lock” the state or capabilities of those sensors and/or require essentially exclusive access. I think the relevant specs for the APIs to access the various hardware separate from the WebXR Device API already (or should) have algorithms about what to do if a device cannot be found which satisfies the constraints. In the case of a device that is already in use by the WebXR Device API, I’d imagine that it would have its capabilities, resolution, etc. already locked (or the device marked as in-use/unavailable). However, as you mention this doesn’t just apply to the camera, but to other pieces of AR Hardware as well, and thus I don’t think this explainer/module is the appropriate place to dive into that discussion. As a result, after discussing this internally @klausw has filed immersive-web/webxr-ar-module#87 on the core AR Module, which I think is the appropriate venue for that discussion. I can add some clarifying text somewhere that the returned camera image (from this API) will be configured based upon the User Agent’s requirements for compositing the AR session, if that would help add some clarity to this explainer?
Apologies if this is just an elaboration of the point I’ve already addressed, but the second line of the explainer states why
The purpose of this API is explicitly “I am using the WebXR Device API on a device that internally uses camera data to derive poses and environment features, and I want synchronized access to that camera image to integrate/use it with that pose data”. The target audience is developers who have already decided that they want to use the WebXR Device API for the benefits that it provides and explicitly do not want to use If you’re still unhappy with the alignment/discussion of the alignment between this API and getUserMedia after this explanation/proposed changes, I think it may be more beneficial to have a video chat, as I may be misunderstanding your concerns. Unfortunately the regularly scheduled immersive web meeting was yesterday and so the next one will be two weeks from now, so please reach out to me via the email I sent to blink-dev to schedule something.
What do you think about adding a line to this section of the spec like:
Access is already scoped to the duration of the XR Session (because we’re leveraging many WebXR Device API concepts).
See above discussion for why I think this is not the appropriate place to discuss that.
To my knowledge other names were not considered. I don’t know that any of us that have worked on this specification have a photographic background. Would a note like the following at the start of the explainer be sufficient?
It’s worth noting that “Raw” doesn’t appear in any of the API’s themselves; so if you know the process to rename the Module/specification, we’d also be happy to pursue that and update references to “raw” within the explainer and specification.
I think this is addressed in my PR #15, I was waiting to merge it to see if there were further updates I needed to make as a result of our discussion. Let me know if there’s something else you’d like changed with that. |
Via the I2S thread, I started reading the explainer, and it seems like several best practices are missing and design directions are not enunciated:
getUserMedia()
The text was updated successfully, but these errors were encountered: