Layouts for Bluesky data in Tiled #767

danielballan · 2024-06-29T18:58:33Z

This was worked out through a conversation with @whs92.

Current Status

Here is an example that works today. We start a tiled server with a database (SQLite or Postgres) and a writable directory. For simplicity, here we use a single-user server.

$ tiled catalog init catalog.db
$ tiled catalog serve catalog.db -w data/ --api-key=secret

The experimental TiledWriter, created as part of the recently flyscanning effort, consumes Bluesky documents and makes API calls into Tiled. These calls can:

Upload metadata
Upload data (creating tables and appending rows during acquisition)
Register references to externally-written data, as specified in StreamResource and StreamDatum

from bluesky import RunEngine
from ophyd.sim import det, motor
from bluesky.plans import count, scan
from bluesky.callbacks.tiled_writer import TiledWriter

RE = RunEngine()

from tiled.client import from_uri
client = from_uri('http://localhost:8000', api_key='secret')

tw = TiledWriter(client)
RE.subscribe(tw)

# Acquire data
RE(count([det]))

The metadata and data can now be accessed via curl or via that Python client object.

Design Goals

We want to represent metadata and data from Bluesky documents in Tiled structures (container, array, table) in a consistent, generic way for all Bluesky runs so that process is reversible. That is, we want to be able "replay" a semantically-equivalent document stream for the purposes of simulating what happened after the fact. This is useful for development and testing of streaming tools on old data.

This requirement unavoidably leads to a nested and rather "busy" structure that has to hold data, timestamps, and configuration for all the streams in the BlueskyRun. We end up with URL paths like:

/{uuid}/primary/data/I0
/{uuid}/primary/config/quadem1/quadem1_integration_time

(Nexus has the same problem: this is an unavoidable consequence of collecting and organizing a lot of context.)

In some contexts, we need to present (a subset of) this information in a flatter form. When navigating the data in a UI, it should not take more than one or two clicks to get to the data of interest. Likewise, it should be possible to quickly get to the data in an interactive IPython or Jupyter session.

We also want to be able to present the metadata and data in layouts that adhere to defined standards, such as Nexus application definitions.

Possible Approaches

Client-side

We could arrange the data and metadata in Tiled "the Bluesky way" and use client code (in Python, React) etc. to fetch the data of interest and "rearrange" it into the desired layout. This has a couple downsides:

The rearrangement has to be reimplemented in each client.
Simple clients like curl cannot do this.
We have observed anecdotally that user-created custom Tiled clients tend to bake in other features and dependencies and not generalizable or shareable beyond their specific use case.

Server-side

We could add to the Tiled server a concept of "views", where the metadata data are stored once but presented in a variety of layouts. It might look something like this:

/{uuid}/streams/primary/data/I0
/{uuid}/streams/primary/config/quadem1/quadem1_integration_time

# direct access to primary stream, which is what people want most of the time
/{uuid}/simple
/{uuid}/simple/I0

# Nexus application definition layout
/{uuid}/NxXAS/{...}

The TiledWriter would create /{uuid}/streams/, a consistent "ground truth" layout generated for all BlueskyRuns. Then, siblings like /{uuid}/simple/ and /{uuid}/NxXAS/ could be registered as "views". This could be done be a separate client or perhaps by extending/configuring TiledWriter.

I am wary of adding this concept to Tiled---something like a "view" or "alias" or "soft-link". It would have to be scoped very carefully, with implications for performance and access control taken into account from the start. But I am coming around to thinking that this is best way to address these use cases:

See the 99%-of-the-time interesting parts of my data in a flat list
curl my data as a Nexus file
Navigate through my data, in a Nexus layout, and download just parts of interest
Replay my data as Bluesky documents to test streaming things

The text was updated successfully, but these errors were encountered:

danielballan · 2024-06-29T19:04:13Z

I should add that @dylanmcreynolds introduced the suggestion of adding "views" to tiled in a lengthy PR discussion with @padraic-shafer and me, in March. All of us were generally favorable on it. We set it aside to focus on delivering a TiledWriter prototype. It's time to revisit and decide whether we want to move forward with that.

callumforrester · 2024-07-03T14:57:39Z

This looks interesting! A few comments:

I think it would be useful to talk more about the storage format of this "ground truth", apologies if that's covered in another issue
I agree that the "views" concept would have to be scoped very carefully, it could also be implemented in a different service, that calls Tiled in the backend
For nexus, we've found it hard to get all of the required context to generate the structure into bluesky documents alone. @DiamondJoseph could comment more but I believe it partly to do with the relative looseness of the application definitions.

danielballan · 2024-08-21T19:09:15Z

I think it would be useful to talk more about the storage format of this "ground truth", apologies if that's covered in another issue

I just wrote up current thinking on that here. #778

danielballan · 2024-08-22T16:33:03Z

In a follow-up conversations with @dylanmcreynolds and @padraic-shafer, an idea is taking shape that seems like a good starting point:

CatalogAdapter is extended to accept some optional YAML config and constructs alternate view(s) of the content. The config would probably include literals, such as hard-coding a value for the key NX_class, and JSON Path strings that point to nodes nested inside the ground-truth contianers, such as pulling primary/data to top-level.

I think a good initial use case here is a view called simple that takes primary/data with a metadata dict that pulls ~10 select keys from start and stop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layouts for Bluesky data in Tiled #767

Layouts for Bluesky data in Tiled #767

danielballan commented Jun 29, 2024

danielballan commented Jun 29, 2024

callumforrester commented Jul 3, 2024

danielballan commented Aug 21, 2024

danielballan commented Aug 22, 2024

Layouts for Bluesky data in Tiled #767

Layouts for Bluesky data in Tiled #767

Comments

danielballan commented Jun 29, 2024

Current Status

Design Goals

Possible Approaches

Client-side

Server-side

danielballan commented Jun 29, 2024

callumforrester commented Jul 3, 2024

danielballan commented Aug 21, 2024

danielballan commented Aug 22, 2024