Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for reading and writing data as simple "bytes" #570

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

padraic-shafer
Copy link
Contributor

@padraic-shafer padraic-shafer commented Sep 5, 2023

Background

From Issue #434: Tiled's data model constrains everything to be one of its recognized structure families (array, dataframe, sparse, node container) or JSON-encodable metadata sitting alongside one of those types. There will be cases where there is binary (not JSON-encodable) information that is relevant and that some clients programs will know what to do with.

Proposed Changes

Tiled should support serving assets as a stream of bytes, so that clients can download files (or other data streams) that do not readily fit into Tiled's other structure families. The metadata may include additional hints (such as MIME type) that help the client interpret the payload.

Below are several suggested changes that arose from the discussion in Issue #434 and additional offline discussions between @danielballan, @jmaruland, and me (@padraic-shafer).

The following aspects will need more discussion

Registering a file/object in the catalog

  • If the MIME type of the file is not detected, or an Adapter cannot otherwise be selected, then Tiled should handle this gracefully and with useful information to the maintainer of the server.
  • Probably, a warning should be logged and the asset should be registered with the structure family "bytes".
  • If no MIME type is detected, then it should probably fall back to "application/octet-stream".
  • If a MIME type is detected but the data type is not readily coerced to a Tiled data structure family, then the structure family "bytes" should be used and the detected MIME type should be recorded.
  • Perhaps a "strict mode" flag could be used to ignore the asset if it matches one of these "fallback" conditions.

Slicing into the byte stream with a HTTP range request

  • The user may want to only download or access a small part of a large file.
  • If they know the exact byte offsets to access, then we could support this with a combination of the python Buffer Protocol and the HTTP header field "Content-Range".
  • See the related Issue Respect range requests. #521.

Lazy loading of the "bytes" data

  • For performance, it might be useful for the python client to return a Dask object representing the underlying bytes of the asset.
  • See, for example, dask.bytes.[core.]read_bytes().

Contents of the metadata's structure field

  • Necessary information like the MIME type and content length can be found in the data_source field.
  • For now it's probably best to keep the structure field empty (null or None).
  • This can be revisited if field testing reveals additional info that would be useful.

@padraic-shafer
Copy link
Contributor Author

In terms of where the code needs to be updated, there are many analogs between this current PR and #549. For convenience here are the diffs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client enhancement New feature or request server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants