-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data formatting, internal and external #5
Comments
A first pass at this is addressed by #7. I'm sure we can make it a lot better but it "works." Feedback is more than welcome since I'm still very much in "learning" mode for recipes. |
A friend and I were working on a data set like this the other day, prompting me to get off my 🍑 a bit on this. Would it make sense to:
|
Hmm. Is "spectra" too specific? |
I like wide vs long for function names. "Spectra" is a bit too specific. It works for a lot of the acronym soup of measurement science (e.g., NMR, MS, IR, UV/VIS) but misses on others (e.g., chromatrogram, thermogram). Are |
Nope! I'll work on a PR and then another to re-do the data into long and wide formats. |
Just some thoughts about data structures...
This will be much more informed when we have examples of more complex experiments and complex instrument results.
"External format"
This is the shape of the data as the user has it.
There are a few ways that the data could be formatted by the user. I'll use the tidyr terminology of "longer" and "wider".
Wider would be where the wavelength values are common across samples and the intensity data are in columns. The number of rows probably represents the number of samples in the data. The
meats
data in the model data package is formatted like this:A longer version would be where there is a column for the wavelength (or some frequency-type index) and another column for the outcome (e.g. intensity, absorption, etc).
For the meats data, that would look like
We should be able to work with data in either format.
"Internal format"
Internal to the recipe, the longer format is better but we probably want to store the data in a more compact way.
For the combinations of the non-measurement columns, we should put the spectroscopy data in a compact format.
For the meat data (in longer format), that would be
The rows again reflect the total number of samples and
.measurements
is a list column with the assay results:We could have an initial function that can make this conversion. Something like
step_spectra_collect(outcome, index)
to make the formatting (I think that we could have step names that start withstep_spectra_*
or something).Here's some example code to go between formats for two examples:
The text was updated successfully, but these errors were encountered: