Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layouts for Bluesky data in Tiled #767

Open
danielballan opened this issue Jun 29, 2024 · 4 comments
Open

Layouts for Bluesky data in Tiled #767

danielballan opened this issue Jun 29, 2024 · 4 comments

Comments

@danielballan
Copy link
Member

This was worked out through a conversation with @whs92.

Current Status

Here is an example that works today. We start a tiled server with a database (SQLite or Postgres) and a writable directory. For simplicity, here we use a single-user server.

$ tiled catalog init catalog.db
$ tiled catalog serve catalog.db -w data/ --api-key=secret

The experimental TiledWriter, created as part of the recently flyscanning effort, consumes Bluesky documents and makes API calls into Tiled. These calls can:

  • Upload metadata
  • Upload data (creating tables and appending rows during acquisition)
  • Register references to externally-written data, as specified in StreamResource and StreamDatum
from bluesky import RunEngine
from ophyd.sim import det, motor
from bluesky.plans import count, scan
from bluesky.callbacks.tiled_writer import TiledWriter

RE = RunEngine()

from tiled.client import from_uri
client = from_uri('http://localhost:8000', api_key='secret')

tw = TiledWriter(client)
RE.subscribe(tw)

# Acquire data
RE(count([det]))

The metadata and data can now be accessed via curl or via that Python client object.

Design Goals

We want to represent metadata and data from Bluesky documents in Tiled structures (container, array, table) in a consistent, generic way for all Bluesky runs so that process is reversible. That is, we want to be able "replay" a semantically-equivalent document stream for the purposes of simulating what happened after the fact. This is useful for development and testing of streaming tools on old data.

This requirement unavoidably leads to a nested and rather "busy" structure that has to hold data, timestamps, and configuration for all the streams in the BlueskyRun. We end up with URL paths like:

/{uuid}/primary/data/I0
/{uuid}/primary/config/quadem1/quadem1_integration_time

(Nexus has the same problem: this is an unavoidable consequence of collecting and organizing a lot of context.)

In some contexts, we need to present (a subset of) this information in a flatter form. When navigating the data in a UI, it should not take more than one or two clicks to get to the data of interest. Likewise, it should be possible to quickly get to the data in an interactive IPython or Jupyter session.

We also want to be able to present the metadata and data in layouts that adhere to defined standards, such as Nexus application definitions.

Possible Approaches

Client-side

We could arrange the data and metadata in Tiled "the Bluesky way" and use client code (in Python, React) etc. to fetch the data of interest and "rearrange" it into the desired layout. This has a couple downsides:

  • The rearrangement has to be reimplemented in each client.
  • Simple clients like curl cannot do this.
  • We have observed anecdotally that user-created custom Tiled clients tend to bake in other features and dependencies and not generalizable or shareable beyond their specific use case.

Server-side

We could add to the Tiled server a concept of "views", where the metadata data are stored once but presented in a variety of layouts. It might look something like this:

/{uuid}/streams/primary/data/I0
/{uuid}/streams/primary/config/quadem1/quadem1_integration_time

# direct access to primary stream, which is what people want most of the time
/{uuid}/simple
/{uuid}/simple/I0

# Nexus application definition layout
/{uuid}/NxXAS/{...}

The TiledWriter would create /{uuid}/streams/, a consistent "ground truth" layout generated for all BlueskyRuns. Then, siblings like /{uuid}/simple/ and /{uuid}/NxXAS/ could be registered as "views". This could be done be a separate client or perhaps by extending/configuring TiledWriter.

I am wary of adding this concept to Tiled---something like a "view" or "alias" or "soft-link". It would have to be scoped very carefully, with implications for performance and access control taken into account from the start. But I am coming around to thinking that this is best way to address these use cases:

  • See the 99%-of-the-time interesting parts of my data in a flat list
  • curl my data as a Nexus file
  • Navigate through my data, in a Nexus layout, and download just parts of interest
  • Replay my data as Bluesky documents to test streaming things
@danielballan
Copy link
Member Author

I should add that @dylanmcreynolds introduced the suggestion of adding "views" to tiled in a lengthy PR discussion with @padraic-shafer and me, in March. All of us were generally favorable on it. We set it aside to focus on delivering a TiledWriter prototype. It's time to revisit and decide whether we want to move forward with that.

@callumforrester
Copy link

This looks interesting! A few comments:

  • I think it would be useful to talk more about the storage format of this "ground truth", apologies if that's covered in another issue
  • I agree that the "views" concept would have to be scoped very carefully, it could also be implemented in a different service, that calls Tiled in the backend
  • For nexus, we've found it hard to get all of the required context to generate the structure into bluesky documents alone. @DiamondJoseph could comment more but I believe it partly to do with the relative looseness of the application definitions.

@danielballan
Copy link
Member Author

I think it would be useful to talk more about the storage format of this "ground truth", apologies if that's covered in another issue

I just wrote up current thinking on that here. #778

@danielballan
Copy link
Member Author

In a follow-up conversations with @dylanmcreynolds and @padraic-shafer, an idea is taking shape that seems like a good starting point:

CatalogAdapter is extended to accept some optional YAML config and constructs alternate view(s) of the content. The config would probably include literals, such as hard-coding a value for the key NX_class, and JSON Path strings that point to nodes nested inside the ground-truth contianers, such as pulling primary/data to top-level.

I think a good initial use case here is a view called simple that takes primary/data with a metadata dict that pulls ~10 select keys from start and stop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants