-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add --allow_overlaps arg to cli #120
Conversation
I am supportive of the spirit of this PR i.e. exposing some opt-in option to export overlapping masks/polygons (or in the case above overlapping masks derived from polygons) especially as we already have some underlying API. Amongst others, for testing e.g. ome/ome-zarr-py#207, building a complete example of OME-NGFF plate with labels using My primary concern is the implementation choice (not introduced in this PR) to compute the sum of all the overlapping values. A few Under some conditions, this can break the dtype mechanism introduced in #116 as the maximal label value might exceed the computed dtype. Additionally, there is no guarantee that these values will be unique e.g. assuming there is an overlap between masks 10 & 13 and 11 & 12, the two sets of intersectings pixels would be assigned with the same value. Finally and probably more importantly, I think this means the label value of an overlapping region (e.g. from masks 1 and 2) can be identical to the label value of another region. Understanding part of this API is lossy by essence, another approach would be to set the value of the overlapping regions to zero before calling omero-cli-zarr/src/omero_zarr/masks.py Lines 529 to 530 in 2342a90
|
Hmmm - good points. One option is to set all overlapping regions to value of Another thought on exporting labels for a Plate: The labels values are now It still allows you to view the labels on a Plate, but when we try to combine the
|
README.rst
Outdated
The default behaviour is to export all masks on the Image to a single 5D | ||
"labeled" zarr array, with a different value for each mask Shape. | ||
An exception will be thrown if any of the masks overlap. | ||
# Allow overlapping masks or polygons (overlap will be sum of each label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum of each label won't necessarily be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that matches the thrid and second and third scenarios I raised in #120 (comment). Any suggestion on the best way to handle overlaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "set all overlapping regions to value of len(shapes) + 2
(one more than the last shape)"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning all the overlaps to a unique label value would fix the outstanding issue with the current implementation where the value of overlapping regions might conflict with existing values. It means that all the overlapping regions will be assigned to the same label value though and I still miss the use cases for creating a separate label for overlaps.
If we decided to go with this approach, I would suggest to rename the option to be explicit about the behavior rather than --allow-overlaps
e.g. --split-overlaps
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice the spec says "Some implementations MAY represent overlapping labels by using a specially assigned
value, for example the highest integer available in the pixel range." which sounds like a good idea.
Re: use case: If I have some labels like this:
I can't tell where the blue has overlapped with the red and green.
But if I assign the overlaps to a unique value (yellow), now I can see the overlaps (apologies for the rough cartoons):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks the examples. From a visualization perspective, assuming someone wants to ask the question of whether regions overlap and where, using a special value makes complete sense. I think my reluctance rather comes from a computational perspective as it is non trivial to establish that a yellow region was either red or blue in one case or blue or green in the other case. Also semantically, two overlapping regions with no relationships with each other might be associated to the same group.
Possibly this simply reflects the fact that independently of our implementation, the generated label image will be lossy to some degree if there is overlap. At minimum, I think the command help should use this terminology.
Thanks for quoting this paragraph of this specification which I had overlooked. In the absence of a better alternative,should we simply implement this suggestion and use max(dtype)-1
? This should remain compatible with the dtype computation introduced previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a try...
@sbesson That commit sets overlapping regions to With that change, viewing the above image in |
This makes me wonder if we wouldn't be safer with:
for extensibility. |
Hmmm - yeah, I get the idea of extensibility, but until then you've got |
except at the moment I think you have |
That's exactly the issue I have with the single
|
@sbesson agreed - I guess it comes down to how likely we are to want other strategies. The current one is the only one I can think of that makes sense (and matches the spec), so my preference is for option 1) - which still allows for other strategies later if needed. But I'm OK with option 2 if we can already think of any other strategies we might want in the future? |
So, having thought about this over lunch... I could imagine the possibility of using negative values for overlapping regions. |
So, let's go for option 2. E.g. |
src/omero_zarr/cli.py
Outdated
subcommand.add_argument( | ||
"--overlaps", | ||
type=str, | ||
choices=["dtype_max"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add "error" (or something similarly named) here as the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API-wise, one suggestion in terms of the location defining the list of allowed value sfor overlaps
and one comment in #120 (comment).
Functionally, the current state of this PR tested using a plate from idr0001
and running omero zarr export
followed by omero zarr polygons --overlaps dtype_max
.The command completed successfully and the resulting plate was uploaded to a public test bucket - see https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_120/2551.zarr/ for the whole plate and https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_120/2551.zarr/A/1/1/labels/0 for an example of label image with multiple overlapping regions.
src/omero_zarr/masks.py
Outdated
@@ -60,9 +60,18 @@ def plate_shapes_to_zarr( | |||
n_fields = plate.getNumberOfFields() | |||
total = n_rows * n_cols * (n_fields[1] - n_fields[0] + 1) | |||
|
|||
# If overlaps isn't 'dtype_max', an exception is thrown if any overlaps exist | |||
check_overlaps = args.overlaps != "dtype_max" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you forward args.overlaps
to MaskSaver
as a new keyword argument, deprecate check_overlap, and handle this detection logic only once within MaskSaver
?
This would also allow to declare a public constant e.g. MaskSaver.OVERLAPS
containing the list of allowed values for overlaps
that could be consumed in cli.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 05bce56 - No need to deprecate check_overlap
as it was only added in this PR.
[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci flake8 and mypy fixes Fix mypy
0eb26a2
to
05bce56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the new API addition makes sense to me and allows some support for combined overlapping masks/polygonss in agreement with the recommendation of the specification while retaining the ability to implement alternative strategies in the future.
Unless @joshmoore would like to see other changes, I'd propose to get this released as 0.5.0
and then regenerate and upload a sample Plate from idr0001
with labels to the public S3 bucket containing the IDR example OME-NGFF datasets.
No objections from my side. 👍 |
src/omero_zarr/masks.py
Outdated
@@ -178,6 +190,8 @@ class MaskSaver: | |||
masks to zarr groups/arrays. | |||
""" | |||
|
|||
OVERLAPS = ["error", "dtype_max"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a very small aside, unless you want this to be externally extensible, a tuple
property would be safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 2a0c9ce
Add support for
--overlaps=dtype_max
to cli for Image and Plate export of polygons and masks. Also update README with this (and mention export of polygons).To test, using IDR: (idr0001):
Not tested Plate export yet, or export of
masks
.Overlapping regions should be exported with a value of
dtype.max
.NB: The masks for the Image above are exported as a single plane with Z index of 0. Only by scrolling to first Z-index are they visible (See screenshot)
I wonder if we can use
coordinateTransformations
to solve this?Without the
--overlaps
arg, we get: