Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organization of CHRIS data #1

Open
jorainer opened this issue Feb 16, 2022 · 1 comment
Open

Organization of CHRIS data #1

jorainer opened this issue Feb 16, 2022 · 1 comment

Comments

@jorainer
Copy link
Member

A key point is to understand how the CHRIS data will be organized. Data from individual data modules will reside within its own folder with the data stored in the new CHRIS textual file format. These modules are then assumed to be organized by "release" (e.g. "baseline") and stored within a release folder. Multiple release folders would then be stored within a base CHRIS data folder. This would result in the following hierarchical structure:

  • CHRIS data folder:
    • CHRIS release folder:
      • CHRIS data module folder.

Interview data for the baseline could thus be found in:

CHRIS/CHRIS-baseline/Interview

Interview data for CHRIS followup in

CHRIS/CHRIS-followup/Interview

@clemens-it , can you please confirm or correct? This will be important for the development of the R interface package to the new CHRIS data.

@jorainer
Copy link
Member Author

Seems we need a different structure - since there will no longer be releases and each module will get it's own version.

For metabolomics data we would store e.g. the baseline p180-kit based measurements into one module called e.g. "targeted metabolomics p180 CHRIS baseline", the data for NAFLD "targeted metabolomics p180 NAFLD". When new data will be measured (e.g. in the CHRIS baseline, the follow up, or the CHRIS COVID-19), the old data will need to be re-normalized which will require a version bump. The versions of the new and old data (when available in different modules) should then also match (e.g. 1.1.0). Ideally, both the old and the new versions of the data should be available in their respective version.

Possible solutions for the folder structure to store such data could be:

A) Folder names consisting of module and version

  • Advantage: the folder name would already be informative.
  • Disadvantage: needs consistent naming, i.e. consistency in the separator between <module name> and <version>.

B) Folder name independent of module and version

  • Instead of encoding the module name and the version in the actual folder name, store this information internally in a general description file of the module.
  • Folder name can be anything (ideally an integer number?).
  • Advantage: independent of folder naming; save against special characters or wrong folder name format.
  • Disadvantage: needs a tool (e.g. the chrisr package) to list available modules and versions to make selection more user-friendly.

Discussion/thoughts

Option B) seems to be the cleanest. Access to the should anyway be done mainly through a dedicated interface (e.g. chrisr) and not manually. And for that it is just important the data is stored in a standardized, systematic way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant