Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using parse_url to create a store reads arrays with all zeros #343

Open
joshua-gould opened this issue Dec 15, 2023 · 10 comments
Open

Using parse_url to create a store reads arrays with all zeros #343

joshua-gould opened this issue Dec 15, 2023 · 10 comments

Comments

@joshua-gould
Copy link
Contributor

import numpy as np
import zarr
from ome_zarr.io import parse_url

g = zarr.open('test.zarr', 'w')
g['foo'] = np.ones((2, 3))

g2 = zarr.open('test.zarr', mode='r')
a2 = g2['foo'][...]
assert a2.max() == 1 # works

g3 = zarr.open(parse_url('test.zarr', mode='r').store)
a3 = g3['foo'][...]
assert a3.max() == 1 # fails
@joshmoore
Copy link
Member

Hi @joshua-gould. parse_url enforces dimension_separator="/". Can you try setting that on all of your calls to pure zarr methods?

@joshua-gould
Copy link
Contributor Author

I've confirmed using the following code to create an array is read in correctly. Can we add a check to ensure the values are not read in incorrectly in case a user does not add dimension_separator='/'?:

a1 = g.create_dataset('foo', shape=(2, 3), dimension_separator='/')
a1[:] = np.ones((2, 3))

@joshua-gould
Copy link
Contributor Author

Note that creating an array using ome-zarr and reading in the array using pure zarr works correctly:

import numpy as np
import zarr
from ome_zarr.io import parse_url

g = zarr.open(parse_url('test.zarr', mode='w').store)
g['foo'] = np.ones((2, 3))

g2 = zarr.open('test.zarr', mode='r')
a2 = g2['foo'][...]
assert a2.max() == 1

@will-moore
Copy link
Member

This is a similar issue as #245

I think I proposed somewhere that when reading, parse_url should just use whatever dimension separator it finds, but I seem to remember there was an argument against doing that.

@joshmoore
Copy link
Member

By "find" you mean looking into the directory to see what files are present? On S3, you can't assume that you can list the directories. Combined with the fact that chunks can be missing, this means you will likely need to try more than a handful of paths before knowing for certain whether or not each array uses "." or "/".

@will-moore
Copy link
Member

No, I meant looking in .zarray.
It seems wrong to ignore the dimension_separator if it's there.
(I know there's not always one there with earlier versions - I think that was the objection before)

@joshmoore
Copy link
Member

(I know there's not always one there with earlier versions - I think that was the objection before)

Exactly.

No, I meant looking in .zarray.

Interesting. If we add our own .zarray reading logic, then we might could do this. It's just that you can't currently detect from the zarr-python metadata if it's missing or set to the default.

@dstansby
Copy link
Contributor

I'm running into this too - it's currently breaking my attempts to read in data using ome-zarr-py 😢

@dstansby
Copy link
Contributor

I think the quite frustrating thing here is by default zarr-python will write with the dimension separator ., which means v2 zarr data written with zarr-python currently doesn't load with ome-zarr-py.

@joshmoore
Copy link
Member

@dstansby: definitely an issue. The single dimension separator character led to a large number of incompatibilities. But rather than try to change the default in zarr-python v2, I think getting us onto zarr v3 ASAP is a better use of our time.

@will-moore will-moore mentioned this issue Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants