Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object-store-based Store implementation #1661

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
14be826
Initial object-store implementation
kylebarron Feb 8, 2024
a492bf0
Merge branch 'v3' into kyle/object-store
d-v-b Feb 12, 2024
50b6c47
Merge branch 'v3' into kyle/object-store
jhamman Feb 27, 2024
afa79af
Update src/zarr/v3/store/object_store.py
kylebarron Feb 27, 2024
c466f9f
Merge branch 'main' into kyle/object-store
kylebarron Oct 21, 2024
c3e7296
Merge branch 'main' into kyle/object-store
kylebarron Oct 22, 2024
f5c884b
update
kylebarron Oct 22, 2024
af2a39b
Handle list streams
kylebarron Nov 1, 2024
d7cfbee
Update get
kylebarron Nov 1, 2024
cb40015
wip refactor get_partial_values
kylebarron Nov 1, 2024
619df43
Merge branch 'main' into kyle/object-store
kylebarron Nov 1, 2024
b976450
Fixes to _get_partial_values
kylebarron Nov 7, 2024
cca70d7
Merge branch 'main' into kyle/object-store
kylebarron Nov 7, 2024
f2c827d
Fix constructing prototype from get
kylebarron Nov 7, 2024
5c8903f
lint
kylebarron Nov 7, 2024
50e1dec
Merge branch 'main' into kyle/object-store
kylebarron Nov 18, 2024
8bb252e
Add docstring
kylebarron Nov 18, 2024
559eafd
Make names private
kylebarron Nov 18, 2024
5486e69
Implement eq
kylebarron Nov 18, 2024
9a05c01
Add obstore as a test dep
maxrjones Nov 18, 2024
56b7a0b
Run store tests on ObjectStore
maxrjones Nov 18, 2024
d5d0d4d
Merge pull request #1 from maxrjones/object-store-tests
kylebarron Nov 20, 2024
b38ada1
import or skip
kylebarron Nov 21, 2024
ab00b46
Bump obstore beta version
kylebarron Nov 22, 2024
9c65e4d
bump pre-commit
kylebarron Nov 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions src/zarr/v3/store/object_store.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
from __future__ import annotations

import asyncio
from typing import List, Optional, Tuple

from object_store import ObjectStore as _ObjectStore
from object_store import Path as ObjectPath

from zarr.v3.abc.store import Store


class ObjectStore(Store):
supports_writes: bool = True
supports_partial_writes: bool = False
supports_listing: bool = True

store: _ObjectStore

def init(self, store: _ObjectStore):
kylebarron marked this conversation as resolved.
Show resolved Hide resolved
self.store = store

def __str__(self) -> str:
return f"object://{self.store}"

def __repr__(self) -> str:
return f"ObjectStore({repr(str(self))})"

async def get(
self, key: str, byte_range: Optional[Tuple[int, Optional[int]]] = None
) -> Optional[bytes]:
if byte_range is None:
return await self.store.get_async(ObjectPath(key))

start, end = byte_range
if end is None:
# Have to wrap a separate object-store function to support this
raise NotImplementedError

return await self.store.get_range_async(ObjectPath(key), start, end - start)

async def get_partial_values(
self, key_ranges: List[Tuple[str, Tuple[int, int]]]
) -> List[bytes]:
# TODO: use rust-based concurrency inside object-store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@kylebarron kylebarron Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object-store has a built-in function for this: get_ranges. With the caveat that it only manages multiple ranges in a single file.

get_ranges also automatically handles request merging for nearby ranges in a file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I know, but mine already did the whole thing, so I am showing how I did that.

futs = [self.get(key, byte_range=byte_range) for (key, byte_range) in key_ranges]

# Seems like a weird type match where `get()` returns `Optional[bytes]` but
# `get_partial_values` is non-optional?
return await asyncio.gather(*futs) # type: ignore

async def exists(self, key: str) -> bool:
try:
_ = await self.store.head_async(ObjectPath(key))
return True
except FileNotFoundError:
return False

async def set(self, key: str, value: bytes) -> None:
await self.store.put_async(ObjectPath(key), value)

async def delete(self, key: str) -> None:
await self.store.delete_async(ObjectPath(key))

async def set_partial_values(self, key_start_values: List[Tuple[str, int, bytes]]) -> None:
raise NotImplementedError

async def list(self) -> List[str]:
objects = await self.store.list_async(None)
return [str(obj.location) for obj in objects]

async def list_prefix(self, prefix: str) -> List[str]:
objects = await self.store.list_async(ObjectPath(prefix))
return [str(obj.location) for obj in objects]

async def list_dir(self, prefix: str) -> List[str]:
list_result = await self.store.list_with_delimiter_async(ObjectPath(prefix))
common_prefixes = set(list_result.common_prefixes)
return [str(obj.location) for obj in list_result.objects if obj not in common_prefixes]