Skip to content

Commit

Permalink
Merge pull request #113 from jku-vds-lab/ginihumer-patch-1
Browse files Browse the repository at this point in the history
add links to example datasets and generation scripts
  • Loading branch information
ginihumer authored Apr 30, 2024
2 parents 4ad60ce + 9284cc7 commit 7c8fa86
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions src/Readme/Dataset_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
In the “Dataset” tab users can choose to either upload their own dataset (orange) or load datasets that were already uploaded previously (yellow).
The “Select dataset” lists datasets that are already available in the back-end (from any user!) and can be deleted with the delete button next to the filename.
The list can also be manually refreshed with the refresh button next to “Select dataset” (this is only necessary if another user uploads a file during a simultaneous session and the current user needs this exact file).
If a user wants to upload a custom file, it must be a CSV or a zipped CSV. CIME4R recognizes special naming of columns in the dataset as described in the “Data Format” subsection. We provide a [datafile generation example](TODO: add examples) to get users started with their own datasets.
If a user wants to upload a custom file, it must be a CSV or a zipped CSV. CIME4R recognizes special naming of columns in the dataset as described in the “Data Format” subsection. We provide [datafile generation examples](https://osf.io/vda72/) to get users started with their own datasets.
Finally, in the advanced settings (green) users can specify a SMILES lookup table and export/import a PSE session.
With the SMILES lookup table, users can define key-value pairs of SMILES strings and human-readable names for those SMILES, which are then used by CIME4R to show the human-readable names instead of SMILES. The file has to be a CSV file with the columns “smiles” and “shortname”. Check out the [example SMILES lookup table](TODO: add example).
With the SMILES lookup table, users can define key-value pairs of SMILES strings and human-readable names for those SMILES, which are then used by CIME4R to show the human-readable names instead of SMILES. The file has to be a CSV file with the columns “smiles” and “shortname”. Check out the [example SMILES lookup table](https://osf.io/9kzpc).
The export button allows users to save the current session of CIME4R so that they can later continue or even share the session with a colleague.

![dataset screenshot](https://user-images.githubusercontent.com/45741696/227914723-6b7a48a0-9d41-4519-b4b6-7cbb31f4f525.PNG)
Expand All @@ -18,7 +18,7 @@ Data is handed to the system using a [comma separated values (CSV)](https://en.w
New files are first uploaded to the python back-end that runs with Flask (https://palletsprojects.com/p/flask/) and then preprocessed and stored in a [PostgreSQL](https://www.postgresql.org/) database.
For big files, the initial upload and preprocessing can take several minutes. If the files are already uploaded, it is much faster.

An example dataset can be found in [TODO: add example dataset](TODO). Datasets used in CIME4R's article are available in the data repository: [TODO: add link to OSF](TODO)
An example dataset and the datasets used in the CIME4R article are available in the [data repository](https://osf.io/vda72/).

### Special Column Names
Users can define arbitrary column names. There are some special column names and column modifiers that are recognized by the system and have special meanings:
Expand All @@ -34,4 +34,4 @@ Column modifiers allow the system to give columns semantic meaning. The followin
- “desc”: this modifier specifies that the column contains a descriptor value for a chemical compound. Usually, a descriptor consists of a lot of values (i.e., columns) that create overhead that is not important to explicitly show to users (i.e., they should not be shown in the table view), but are nice to use for projection.
- time-series data: since we deal with cyclic data (i.e., experiments are done in an iterative process), we can specify new values of a cycle as additional columns. The system automatically detects time-series data when they end with an **underscore** followed by a **number** (e.g., test_0, test_1, test_2). All columns that have the same name without the number are recognized to belong together. The number indicates the order.
- “pred”, “predicted”: are special modifiers for time-series data that indicate that these values were predicted by a model. These modifiers tell the system that this time-series data contains tuples. The naming must start with "pred" or “predicted” followed by the name of the variable and end with the timestep number, all separated with underscores (e.g. pred_mean_0, pred_var_0,...)
- “shap”: another special modifier for time-series data. It tells the system that this column contains diverging values as created by [SHAP (SHapley Additive exPlanations)](https://shap.readthedocs.io/en/latest/). The column names must end with "_shap" followed by the timestep (e.g., temperature_shap_0).
- “shap”: another special modifier for time-series data. It tells the system that this column contains diverging values as created by [SHAP (SHapley Additive exPlanations)](https://shap.readthedocs.io/en/latest/). The column names must end with "_shap" followed by the timestep (e.g., temperature_shap_0).

0 comments on commit 7c8fa86

Please sign in to comment.