Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --allow_overlaps arg to cli #120

Merged
merged 11 commits into from
Jul 6, 2022

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Jun 21, 2022

Add support for --overlaps=dtype_max to cli for Image and Plate export of polygons and masks. Also update README with this (and mention export of polygons).
To test, using IDR: (idr0001):

$ omero zarr export Image:1229801
$ omero zarr polygons Image:1229801 --overlaps=dtype_max

$ napari --plugin napari-ome-zarr 1229801.zarr

Not tested Plate export yet, or export of masks.
Overlapping regions should be exported with a value of dtype.max.

NB: The masks for the Image above are exported as a single plane with Z index of 0. Only by scrolling to first Z-index are they visible (See screenshot)
I wonder if we can use coordinateTransformations to solve this?

Without the --overlaps arg, we get:

  File "/Users/wmoore/Desktop/ZARR/omero-cli-zarr/src/omero_zarr/masks.py", line 529, in masks_to_labels
    raise Exception(
Exception: Mask 632801 overlaps with existing labels

Screenshot 2022-06-21 at 13 47 57

@sbesson
Copy link
Member

sbesson commented Jun 21, 2022

I am supportive of the spirit of this PR i.e. exposing some opt-in option to export overlapping masks/polygons (or in the case above overlapping masks derived from polygons) especially as we already have some underlying API. Amongst others, for testing e.g. ome/ome-zarr-py#207, building a complete example of OME-NGFF plate with labels using idr0001 would be very advantageous.

My primary concern is the implementation choice (not introduced in this PR) to compute the sum of all the overlapping values. A few Under some conditions, this can break the dtype mechanism introduced in #116 as the maximal label value might exceed the computed dtype. Additionally, there is no guarantee that these values will be unique e.g. assuming there is an overlap between masks 10 & 13 and 11 & 12, the two sets of intersectings pixels would be assigned with the same value. Finally and probably more importantly, I think this means the label value of an overlapping region (e.g. from masks 1 and 2) can be identical to the label value of another region.

Understanding part of this API is lossy by essence, another approach would be to set the value of the overlapping regions to zero before calling

labels[i_t, i_c, i_z, y : (y + h), x : (x + w)] += (
binim_yx * shape_value
. Effectively, this means the value of the first mask/polygon found for each pixel would take precedence over all following values.

@will-moore
Copy link
Member Author

Hmmm - good points. One option is to set all overlapping regions to value of len(shapes) + 2 (one more than the last shape). In most cases that wouldn't be lossy (except in the cases where 2 overlapping regions overlap!). It would allow you to preserve the shape of the overlapping regions, instead of losing that info.

Another thought on exporting labels for a Plate: The labels values are now 1 -> n for each Image, but when we combine these into a Plate in ome-zarr-py reader, there will be a blob of value 1 for each Image on the Plate.

It still allows you to view the labels on a Plate, but when we try to combine the label-values properties for each Image into a single dict for the Plate, we'd lose that info. Current behaviour in https://github.com/ome/ome-zarr-py/pull/207/files is:

        # combine 'properties' from each image
        # from https://github.com/ome/ome-zarr-py/pull/61/
        properties: Dict[int, Dict[str, Any]] = {}
        for well_path in self.well_paths:
            path = self.get_image_path(well_path) + ".zattrs"
            labels_json = self.zarr.get_json(path).get("image-label", {})
            # NB: assume that 'label_val' is unique across all images
            props_list = labels_json.get("properties", [])
            if props_list:
                for props in props_list:
                    label_val = props["label-value"]
                    properties[label_val] = dict(props)
                    del properties[label_val]["label-value"]
        node.metadata["properties"] = properties

README.rst Outdated
The default behaviour is to export all masks on the Image to a single 5D
"labeled" zarr array, with a different value for each mask Shape.
An exception will be thrown if any of the masks overlap.
# Allow overlapping masks or polygons (overlap will be sum of each label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sum of each label won't necessarily be unique.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that matches the thrid and second and third scenarios I raised in #120 (comment). Any suggestion on the best way to handle overlaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "set all overlapping regions to value of len(shapes) + 2 (one more than the last shape)"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning all the overlaps to a unique label value would fix the outstanding issue with the current implementation where the value of overlapping regions might conflict with existing values. It means that all the overlapping regions will be assigned to the same label value though and I still miss the use cases for creating a separate label for overlaps.

If we decided to go with this approach, I would suggest to rename the option to be explicit about the behavior rather than --allow-overlaps e.g. --split-overlaps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice the spec says "Some implementations MAY represent overlapping labels by using a specially assigned
value, for example the highest integer available in the pixel range." which sounds like a good idea.

Re: use case: If I have some labels like this:

Screenshot 2022-06-23 at 17 21 06

I can't tell where the blue has overlapped with the red and green.
But if I assign the overlaps to a unique value (yellow), now I can see the overlaps (apologies for the rough cartoons):

Screenshot 2022-06-23 at 20 00 53

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks the examples. From a visualization perspective, assuming someone wants to ask the question of whether regions overlap and where, using a special value makes complete sense. I think my reluctance rather comes from a computational perspective as it is non trivial to establish that a yellow region was either red or blue in one case or blue or green in the other case. Also semantically, two overlapping regions with no relationships with each other might be associated to the same group.

Possibly this simply reflects the fact that independently of our implementation, the generated label image will be lossy to some degree if there is overlap. At minimum, I think the command help should use this terminology.

Thanks for quoting this paragraph of this specification which I had overlooked. In the absence of a better alternative,should we simply implement this suggestion and use max(dtype)-1? This should remain compatible with the dtype computation introduced previously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try...

@will-moore
Copy link
Member Author

@sbesson That commit sets overlapping regions to dtype.max as in the spec "the highest integer available in the pixel range" not dtype.max - 1?
I updated the help message for --allow_overlaps but I think the name is OK. --split_overlaps sounds like you're choosing between different ways to handle overlaps, but the choice is really just to allow them or not?

With that change, viewing the above image in napari as before, mousing over the 1 overlap shows it's value of 127:

Screenshot 2022-06-24 at 11 29 04

@joshmoore
Copy link
Member

I updated the help message for --allow_overlaps but I think the name is OK. --split_overlaps sounds like you're choosing between different ways to handle overlaps, but the choice is really just to allow them or not?

This makes me wonder if we wouldn't be safer with:

--overlaps=[error (default) | allow | ...]

for extensibility.

@will-moore
Copy link
Member Author

Hmmm - yeah, I get the idea of extensibility, but until then you've got --overlaps allow and no other option, which is slightly odd. And when (if) you add other options you might have --overlaps [allow | ignore | set-zero] and then the allow option looks odd as it doesn't describe a behaviour.
That's only a minor concern, so I'm happy to go for it, but I wonder that we get sub-optimal situation in both cases to avoid the (small?) chance we might want to change the API later.

@joshmoore
Copy link
Member

but until then you've got --overlaps allow and no other option

except at the moment I think you have --overlaps=error, no? So that stays as the default and you are adding a different "overlap strategy" here.

@sbesson
Copy link
Member

sbesson commented Jun 24, 2022

and then the allow option looks odd as it doesn't describe a behaviour.

That's exactly the issue I have with the single --allow-overlaps flag at the moment as I feel I need more to understand what this does. I can see two options:

  1. an --allow-overlaps flag that determines whether you error or not and eventually an --overlap-strategy= [strategy1 | strategy2] flag that allows to choose different resolution mechanism
  2. a single --overlaps flag as proposed above but in that case, I would use something more specific than allow to cater for future implemmentation

@will-moore
Copy link
Member Author

@sbesson agreed - I guess it comes down to how likely we are to want other strategies. The current one is the only one I can think of that makes sense (and matches the spec), so my preference is for option 1) - which still allows for other strategies later if needed. But I'm OK with option 2 if we can already think of any other strategies we might want in the future?

@will-moore
Copy link
Member Author

So, having thought about this over lunch... I could imagine the possibility of using negative values for overlapping regions.
The use-case being... If I want to combine the labels from HCS images into a Plate, I might want to find the max(label_value) for each image, so as to add an offset to the next set of labels, to keep them unique. In which case, It might be useful for max(label_value) to NOT be dype.max.
As far as I can see from the spec, negative numbers are acceptable for label values, so using negative values for overlapping region could allow you to have a unique value for each overlap?

@will-moore
Copy link
Member Author

So, let's go for option 2. E.g. --overlaps=dtype_max sound good?

subcommand.add_argument(
"--overlaps",
type=str,
choices=["dtype_max"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add "error" (or something similarly named) here as the default.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API-wise, one suggestion in terms of the location defining the list of allowed value sfor overlaps and one comment in #120 (comment).

Functionally, the current state of this PR tested using a plate from idr0001 and running omero zarr export followed by omero zarr polygons --overlaps dtype_max.The command completed successfully and the resulting plate was uploaded to a public test bucket - see https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_120/2551.zarr/ for the whole plate and https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_120/2551.zarr/A/1/1/labels/0 for an example of label image with multiple overlapping regions.

@@ -60,9 +60,18 @@ def plate_shapes_to_zarr(
n_fields = plate.getNumberOfFields()
total = n_rows * n_cols * (n_fields[1] - n_fields[0] + 1)

# If overlaps isn't 'dtype_max', an exception is thrown if any overlaps exist
check_overlaps = args.overlaps != "dtype_max"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you forward args.overlaps to MaskSaver as a new keyword argument, deprecate check_overlap, and handle this detection logic only once within MaskSaver?

This would also allow to declare a public constant e.g. MaskSaver.OVERLAPS containing the list of allowed values for overlaps that could be consumed in cli.py .

Copy link
Member Author

@will-moore will-moore Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 05bce56 - No need to deprecate check_overlap as it was only added in this PR.

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

flake8 and mypy fixes

Fix mypy
src/omero_zarr/masks.py Outdated Show resolved Hide resolved
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the new API addition makes sense to me and allows some support for combined overlapping masks/polygonss in agreement with the recommendation of the specification while retaining the ability to implement alternative strategies in the future.

Unless @joshmoore would like to see other changes, I'd propose to get this released as 0.5.0 and then regenerate and upload a sample Plate from idr0001 with labels to the public S3 bucket containing the IDR example OME-NGFF datasets.

@joshmoore
Copy link
Member

No objections from my side. 👍

@@ -178,6 +190,8 @@ class MaskSaver:
masks to zarr groups/arrays.
"""

OVERLAPS = ["error", "dtype_max"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a very small aside, unless you want this to be externally extensible, a tuple property would be safer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2a0c9ce

@sbesson sbesson merged commit 8770ec6 into ome:master Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants