Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start a roadmap #58

Merged
merged 4 commits into from
Dec 9, 2024
Merged

Start a roadmap #58

merged 4 commits into from
Dec 9, 2024

Conversation

dstansby
Copy link
Contributor

This is very much a suggestion, apart from I think we should aim for v1 to have OME-zarr 0.4 validation. Open questions:

  • Should v1 include metadata writing?
  • What else should happen after v1?

@imagejan
Copy link

At some point in the future, I think it would be nice if the versions of this package (a reference implementation) would align with the versions of the NGFF spec (such that v0.4 would contain the data models for OME-Zarr 0.4, and v0.5 would contain the data models for both OME-Zarr 0.4 and 0.5, etc.), just to avoid confusion with all the different packages and versions. Or do you think this is asking for too much?

@d-v-b
Copy link
Contributor

d-v-b commented Nov 27, 2024

At some point in the future, I think it would be nice if the versions of this package (a reference implementation) would align with the versions of the NGFF spec (such that v0.4 would contain the data models for OME-Zarr 0.4, and v0.5 would contain the data models for both OME-Zarr 0.4 and 0.5, etc.), just to avoid confusion with all the different packages and versions. Or do you think this is asking for too much?

I think we have to version our package independently from the OME-Zarr versions. If the versions are locked together, then it's hard for us to issue patch releases or bugfixes.

@dstansby
Copy link
Contributor Author

To add, our plan is to have different namespaces for different versions of the OME-zarr spec, so currently we have a ome_zarr_models.v04 namespace, but we'll add v05, v06 etc. to this in time.

@d-v-b
Copy link
Contributor

d-v-b commented Nov 27, 2024

  • Should v1 include metadata writing?

Yes, and in a simple form we have metadata writing today -- any class that inherits from GroupSpec has a to_zarr(store, path) method that will create an OME Zarr group (and zarr arrays) in the provided storage backend. we could make this fancier, e.g. by only writing group attributes (but still checking that the entire zarr group is valid).

I think a MVP for this repo should be sufficient to easily read an OME-Zarr image from storage, make some changes to that metadata, and save a new OME-Zarr image to a different storage backend with those changes. That should cover the spectrum of basic use cases, I think, and I don't think we are too far from that tbh

@dstansby dstansby marked this pull request as ready for review December 9, 2024 22:27
@dstansby
Copy link
Contributor Author

dstansby commented Dec 9, 2024

I'm sticking my neck out a bit here and have decided to put mutating and reading/writing in v2. In the interests of getting a proof of concept done, I think having something that can read, validate, and provide a nice Python interface to the metadata and data is already a big win, so that's what I've put in the roadmap.

This shouldn't discourage anyone from working on mutation and saving metadata back to disk though - in theory it shouldn't be too hard!

@dstansby dstansby merged commit 03c5b6c into main Dec 9, 2024
3 checks passed
@dstansby dstansby deleted the roadmap branch December 9, 2024 22:30
@d-v-b
Copy link
Contributor

d-v-b commented Dec 9, 2024

maybe I just need to write up an example? since we have classes that inherit from GroupSpec, and GroupSpec has a to_zarr method that saves zarr group / array metadata to disk, then our classes that inherit from GroupSpec already support writing.

@dstansby
Copy link
Contributor Author

dstansby commented Dec 9, 2024

In theory it's not tricky to just write out whatever metadata we have, but in practice I think we have to be quite careful, because we want to avoid writing out invalid metadata, which means we have to carefully forbid things like:

  • Creating a multiscales that has a path that doesn't exist in the group
  • Adding more axes than dimensions in a zarr array
  • ... probably many more

@d-v-b
Copy link
Contributor

d-v-b commented Dec 10, 2024

we had validation for these things already, did we remove it?

@d-v-b
Copy link
Contributor

d-v-b commented Dec 10, 2024

ok we still have it, but it's been moved:

class Image(GroupSpec[ImageAttrs, ArraySpec | GroupSpec]):

this class already reads and writes valid ome-ngff metadata, with the validation routines you mentioned. Here's an example:

# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "ome-zarr-models",
#     "pydantic-zarr",
#     "rich"
# ]
#
# [tool.uv.sources]
# ome-zarr-models = { git = "https://github.com/BioImageTools/ome-zarr-models-py.git" }
# ///

from pydantic_zarr.v2 import ArraySpec
from ome_zarr_models.v04._image_old import Image
from ome_zarr_models.v04.coordinate_transformations import _build_transforms
from ome_zarr_models.v04.multiscales import Multiscale, Dataset
from ome_zarr_models.v04.axes import Axis
from zarr import MemoryStore
from rich import print

# define some arrays
names = ('s0', 's1')
shapes = ((10,10), (5,5))
dtypes = ("uint8", "uint8")
scales = ((1,1), (2,2))
translates = ((0,0), (0.5, 0.5))


axes = [Axis(name='a', type='space'), Axis(name='b', type='space')]
datasets = [Dataset(path=k, coordinateTransformations=_build_transforms(scale=s, translation=t)) for k, s, t in zip(names, scales, translates)]
multiscale_meta = Multiscale(
    axes=axes,
    datasets=datasets        
    )

arrays = {name: ArraySpec(shape=shape, dtype=dtype, chunks=shape) for name, shape, dtype in zip(names, shapes, dtypes)}
group = Image(attributes={'multiscales': [multiscale_meta]}, members=arrays)
# a model of the zarr hierarchy we will create
print(group.model_dump())

# create the actual zarr metadata
store = MemoryStore()
zgroup = group.to_zarr(store, path='test')
print(zgroup.attrs.asdict())

# check that the arrays are correct
assert zgroup['s0'].shape == (10, 10)
assert zgroup['s1'].shape == (5, 5)

# create a brand new Image model, based on the one we created
group2 = Image.from_zarr(zgroup)
assert group2.model_dump() == group.model_dump()

@d-v-b
Copy link
Contributor

d-v-b commented Dec 10, 2024

when I run that, I see this:

(ome-zarr-models) ➜  ome-zarr-models-py git:(main) ✗ uv run example.py
Reading inline script metadata from `example.py`
 Updated https://github.com/BioImageTools/ome-zarr-models-py.git (83f2c49)
{
    'zarr_version': 2,
    'attributes': {
        'multiscales': [
            {
                'axes': [{'name': 'a', 'type': 'space', 'unit': None}, {'name': 'b', 'type': 'space', 'unit': None}],
                'datasets': [
                    {
                        'path': 's0',
                        'coordinateTransformations': (
                            {'type': 'scale', 'scale': [1.0, 1.0]},
                            {'type': 'translation', 'translation': [0.0, 0.0]}
                        )
                    },
                    {
                        'path': 's1',
                        'coordinateTransformations': (
                            {'type': 'scale', 'scale': [2.0, 2.0]},
                            {'type': 'translation', 'translation': [0.5, 0.5]}
                        )
                    }
                ],
                'version': None,
                'coordinateTransformations': None,
                'metadata': None,
                'name': None,
                'type': None
            }
        ],
        'omero': None,
        'image_labels': None
    },
    'members': {
        's0': {
            'zarr_version': 2,
            'attributes': {},
            'shape': (10, 10),
            'chunks': (10, 10),
            'dtype': '|u1',
            'fill_value': 0,
            'order': 'C',
            'filters': None,
            'dimension_separator': '/',
            'compressor': None
        },
        's1': {
            'zarr_version': 2,
            'attributes': {},
            'shape': (5, 5),
            'chunks': (5, 5),
            'dtype': '|u1',
            'fill_value': 0,
            'order': 'C',
            'filters': None,
            'dimension_separator': '/',
            'compressor': None
        }
    }
}
{
    'multiscales': [
        {
            'axes': [{'name': 'a', 'type': 'space', 'unit': None}, {'name': 'b', 'type': 'space', 'unit': None}],
            'datasets': [
                {
                    'path': 's0',
                    'coordinateTransformations': (
                        {'type': 'scale', 'scale': [1.0, 1.0]},
                        {'type': 'translation', 'translation': [0.0, 0.0]}
                    )
                },
                {
                    'path': 's1',
                    'coordinateTransformations': (
                        {'type': 'scale', 'scale': [2.0, 2.0]},
                        {'type': 'translation', 'translation': [0.5, 0.5]}
                    )
                }
            ],
            'version': None,
            'coordinateTransformations': None,
            'metadata': None,
            'name': None,
            'type': None
        }
    ],
    'omero': None,
    'image_labels': None
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants