Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax restrictions / guidelines #2

Open
ssomnath opened this issue Dec 8, 2018 · 2 comments
Open

Relax restrictions / guidelines #2

ssomnath opened this issue Dec 8, 2018 · 2 comments

Comments

@ssomnath
Copy link
Member

ssomnath commented Dec 8, 2018

Per this google group conversation, we should relax any restrictions placed on how data should be stored beyond the core Main Dataset - Ancillary Dataset constructs.

We need to comb through the entire specifications document and relax restrictions unless absolutely essential. The few examples I can think of at the moment are:

  1. Videos and time series
  2. (mixed precision) compound datasets (cannot be handled in C++, Fortran)
  3. Compound dataset or Channels?
  4. Not mandatory that a single HDF5 file must contain only USID datasets or datasets pertaining to a single raw measurement. Some users have expressed the desire to use a single HDF5 file to store all their imaging, spectroscopy, etc. data pertaining to (for example) a day's measurement or a project.
@mpanighel
Copy link

Hello Suhas. I was developing some Python code to convert scanning tunneling microscopy data from proprietary RHK to HDF5 and, after checking NeXus, I came across USID.

I think its core idea of flattening data to 2D is really good (and actually could allow this format to be used indeed as universal standard for scientific data)! This is indeed its strength and I see that for this reason Ancillary datasets are mandatory. On the other hand, for regularly sampled data, as already pointed out, they are also redundant and unnecessarily complicated (besides wasting quite a lot of space especially for gray scale images). This is something that is keeping me a bit.

While still keeping the construct of Main + Ancillary, in order to allow the representation of any data, do you think the definition of the Ancillary datasets could be relaxed? For example the ancillary attribute of a channel could link to a "full length" dataset (as it is now) or to a group/dataset (to be precisely defined) containing start/stop/increment. Then one let pyUSID (or equivalent) to handle this and, if necessary, create "on the fly" the full length dataset.

@ssomnath
Copy link
Member Author

@mpanighel Thank you for your interest in USID and for sharing your ideas. You do indeed bring up good points. We realize that the strict main + ancillary dataset rules make USID an overkill for simplistic and small datasets like images or single spectra. We also realize that the majority of data have parameters that have been varied in a linear manner and have thought to some extent about how to avoid being verbose where unnecessary. We would be happy to work with you to incorporate this capability if you are interested. Please feel free to get in touch with us at [email protected] to discuss more (please send us an email at [email protected] with the email address you would like to use for your slack account).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants