Organization of CHRIS data #1

jorainer · 2022-02-16T11:05:33Z

A key point is to understand how the CHRIS data will be organized. Data from individual data modules will reside within its own folder with the data stored in the new CHRIS textual file format. These modules are then assumed to be organized by "release" (e.g. "baseline") and stored within a release folder. Multiple release folders would then be stored within a base CHRIS data folder. This would result in the following hierarchical structure:

CHRIS data folder:
- CHRIS release folder:
  - CHRIS data module folder.

Interview data for the baseline could thus be found in:

CHRIS/CHRIS-baseline/Interview

Interview data for CHRIS followup in

CHRIS/CHRIS-followup/Interview

@clemens-it , can you please confirm or correct? This will be important for the development of the R interface package to the new CHRIS data.

The text was updated successfully, but these errors were encountered:

jorainer · 2022-02-18T12:27:20Z

Seems we need a different structure - since there will no longer be releases and each module will get it's own version.

For metabolomics data we would store e.g. the baseline p180-kit based measurements into one module called e.g. "targeted metabolomics p180 CHRIS baseline", the data for NAFLD "targeted metabolomics p180 NAFLD". When new data will be measured (e.g. in the CHRIS baseline, the follow up, or the CHRIS COVID-19), the old data will need to be re-normalized which will require a version bump. The versions of the new and old data (when available in different modules) should then also match (e.g. 1.1.0). Ideally, both the old and the new versions of the data should be available in their respective version.

Possible solutions for the folder structure to store such data could be:

A) Folder names consisting of module and version

Advantage: the folder name would already be informative.
Disadvantage: needs consistent naming, i.e. consistency in the separator between <module name> and <version>.

B) Folder name independent of module and version

Instead of encoding the module name and the version in the actual folder name, store this information internally in a general description file of the module.
Folder name can be anything (ideally an integer number?).
Advantage: independent of folder naming; save against special characters or wrong folder name format.
Disadvantage: needs a tool (e.g. the chrisr package) to list available modules and versions to make selection more user-friendly.

Discussion/thoughts

Option B) seems to be the cleanest. Access to the should anyway be done mainly through a dedicated interface (e.g. chrisr) and not manually. And for that it is just important the data is stored in a standardized, systematic way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organization of CHRIS data #1

Organization of CHRIS data #1

jorainer commented Feb 16, 2022

jorainer commented Feb 18, 2022

Organization of CHRIS data #1

Organization of CHRIS data #1

Comments

jorainer commented Feb 16, 2022

jorainer commented Feb 18, 2022

A) Folder names consisting of module and version

B) Folder name independent of module and version

Discussion/thoughts