Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add background segmentation mask #142

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1688,5 +1688,59 @@ <h2>MediaStream in workers</h2>
};</pre>
</div>
</section>
<section>
<h2>Background segmentation mask</h2>
<p>Some platforms or User Agents may provide built-in support for background segmentation of video frames, in particular for camera video streams.
Web applications may want to control whether background segmentation is computed at the source level and to get access to the computed segmentation masks.
This allows the web application for instance
to do custom framing or background blurring or replacement
while leveraging on platform computed background segmentation.
This allows the web application
to access the original unmodified frame and
to fine tune frame modifications based on its likings.
For that reason, we extend {{MediaStreamTrack}} with the following properties and {{VideoFrame}} with the following attributes.
</p>
<pre class="idl">
partial dictionary MediaTrackSupportedConstraints {
boolean backgroundSegmentationMask = true;
};

partial dictionary MediaTrackConstraintSet {
ConstrainBoolean backgroundSegmentationMask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it ever be interesting and feasible to tweak the parameters by which segmentation is done?

Copy link

@riju riju May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atleast on Windows, the platform model does not allow tweaking segmentation parameters today. Using tensorflow.js with BodyPix model for Blur, I see there's atleast a segmentationThreshold parameter. Maybe it's the same as foregroundThresholdProbability with the MediaPipeSelfieSegmentation model ?

Did you have some other parameters in mind ?

mediapipe_parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have some other parameters in mind?

I am not knowledgeable enough on what parameters would be best to include. I was mostly wondering if this is something we foresee extending from a boolean to a set of parameters, and if so, whether there was a viable path for such future extensions given the current API shape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Media Capture API, the parameter space is flat and not hierarchical.

As an example, there is a constrainable property called whiteBalanceMode which can be constrained to manual. If one then wants to manually change the white balance, there is a constrainable property called colorTemperature which can be constrained separately in order to do that.

So if we later would like to add a numeric constrainable property called backgroundSegmentationThreshold (which could change the segmentation mask to be pre-processed to an blank and white mask according to the threshold without shades of grey) or a string constrainable property called backgroundSegmentationModel (to use the particular AI model), we could certainly do that.

};

partial dictionary MediaTrackSettings {
boolean backgroundSegmentationMask;
};

partial dictionary MediaTrackCapabilities {
sequence&lt;boolean&gt; backgroundSegmentationMask;
};</pre>
<section>
<h3>VideoFrame interface extensions</h3>
<pre class="idl">
partial interface VideoFrame {
readonly attribute VideoFrame? backgroundSegmentationMask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this isn't going to suffer infinite recursion because the second layer deep will be guaranteed nullable. But it still strikes me as a bit odd to expose a full VideoFrame here, with all its present and future fields, when what we really wish to get is a matrix of integer values of a limited range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, recursion is definitely not wanted.

While I by no mean insist on VideoFrame, I think that it is benefial, if the background segmentation mask can be directly passed, for instance, to Canvas.drawImage() or such.

Additionally, because usages of background segmentation masks are manifold (it could be post-processed remotely, locally on CPU or on GPU, etc.) and sources and pre-processing could vary (maybe the source is a boolean matrix or an integer matrix or a GPU texture), it would be good IMHO if the API didn't enforce a particular storage or representation. A VideoFrame is good in that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the attribute readonly? If JS wishes to modify the background segmentation mask of a frame, how can you do it? Create a new video frame with a new segmentation mask member? How is that passed to the video frame constructor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note VideoFrame is defined by the Media WG, so I think this needs to be discussed there.

Unless we make backgroundSegmentationMask metadata? Either way, we should involve the Media WG here based on w3c/webcodecs#607 (comment).

If JS wishes to modify the background segmentation mask of a frame, how can you do it? Create a new video frame with a new segmentation mask member?

These are good questions I suspect the Media WG can answer. They made VideoFrame and its metadata immutable and define its interaction model.

Like @eladalon1983 I find it odd to expose a full VideoFrame for a mask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the attribute readonly? If JS wishes to modify the background segmentation mask of a frame, how can you do it? Create a new video frame with a new segmentation mask member? How is that passed to the video frame constructor?

I made backgroundSegmentationMask to be metadata. That resolves the issue, I think.

};</pre>
<section>
<h4>Attributes</h4>
<dl data-link-for="VideoFrame" data-dfn-for="VideoFrame" class="attributes">
<dt><dfn><code>backgroundSegmentationMask</code></dfn>
of type
<span class="idlAttrType">{{VideoFrame}}</span>,
readonly</dt>
<dd>
<p>A background segmentation mask with
white denoting certainly foreground,
black denoting certainly background and
grey denoting uncertainty or ambiguity with
light shades of grey denoting likely foreground and
dark shades of grey denoting likely background.</p>
</dd>
</dl>
</section>
</div>
</section>
</section>
</body>
</html>