From 096d3d1cff7913ff4236f66ddc046f915ed66dd5 Mon Sep 17 00:00:00 2001
From: Vedran Kasalica <v.kasalica@esciencecenter.nl>
Date: Wed, 13 Mar 2024 00:38:03 +0100
Subject: [PATCH] Reuse tsdf description used in our documentation

---
 README.md | 90 ++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 63 insertions(+), 27 deletions(-)

diff --git a/README.md b/README.md
index e342d90..bfab18a 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 
-# tsdf
+# Welcome to the TSDF (Time Series Data Format)
 
 | Badges | |
 |:----:|----|
@@ -9,42 +9,82 @@
 | **License** |  [![GitHub license](https://img.shields.io/github/license/biomarkersParkinson/tsdf)](https://github.com/biomarkersparkinson/tsdf/blob/main/LICENSE) |
 | **Fairness** |  [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/8083/badge)](https://www.bestpractices.dev/projects/8083) |
 
+A package to work with TSDF data in Python. This implementation is based on the the TSDF format specification, which can be found in this [preprint](https://arxiv.org/abs/2211.11294).
+
+## What is TSDF data?
+
+TSDF provides a unified, user-friendly format for both numerical sensor data and metadata, utilizing raw binary data and JSON-format text files for measurements/timestamps and metadata, respectively. It defines essential metadata fields to enhance data interpretability and exchangeability, aiming to bolster scientific reproducibility in studies reliant on digital biosensor data as a critical evidence base across various disease domains.
+
+## How does the TSDF library work?
+
+Detailed documentation and examples can be found in the [documentation](https://biomarkersparkinson.github.io/tsdf/).
+
+## Example: TSDF Metadata
+
+This example demonstrates a TSDF metadata JSON file, showcasing the structured format used to easily interpret and read the corresponding binary data. For more intricate examples and detailed specifications, the paper serves as a comprehensive reference.
+
+```json
+{
+    "study_id": "voicedata",
+    "subject_id": "recruit089",
+    "device_id": "audiotechnica02",
+    "endianness": "little",
+    "metadata_version": "0.1",
+    "start_iso8601": "2016-08-09T10:31:00.000+00:00",
+    "end_iso8601": "2016-08-10T10:31:30.000+00:00",
+    "sampling_rate": 44100,
+    "rows": 1323000,
+    "channels": [
+        "left",
+        "right"
+    ],
+    "units": [
+        "unitless",
+        "unitless"
+    ],
+    "compression": "none",
+    "data_type": "int",
+    "bits": 16,
+    "file_name": "audio_voice_089.raw"
+}
+```
+**Explanation:**
 
+- `study_id`: Identifies the study as "voicedata".
 
-A package ([documentation](https://biomarkersparkinson.github.io/tsdf/)) to load TSDF data ([specification](https://arxiv.org/abs/2211.11294)) into Python.
+- `subject_id`: Specifies the subject as "recruit089".
 
-## Installation
+- `device_id`: Indicates the device used as "audiotechnica02".
 
-### Using `pip`
+- `endianness`: Specifies the byte order as "little".
 
-The package is available in PyPi and requires [Python 3.9](https://www.python.org/downloads/) or higher. It can be installed using:
+- `metadata_version`: Denotes the metadata version as "0.1".
 
-```bash
-$ pip install tsdf
-```
+- `start_iso8601` and `end_iso8601`: Define the start and end timestamps of data collection in ISO 8601 format.
 
-## Usage
+- `sampling_rate`: Represents the data sampling rate as 44,100 samples per second.
 
-See our [extended tutorials](https://biomarkersparkinson.github.io/tsdf/).
+- `rows`: Specifies the number of data rows as 1,323,000.
 
-## Development
+- `channels`: Lists the data channels as "left" and "right".
 
-### Running tests
+- `units`: Specifies the units for each channel as "unitless".
 
-```bash
-poetry install
-poetry run pytest
-```
+- `compression`: Indicates that no compression has been applied to the data.
 
-### Building the documentation
+- `data_type`: Defines the data type as "int".
+
+- `bits`: Specifies the bit length as 16.
+
+- `file_name`: Names the binary file "audio_voice_089.raw" that contains the described data.
 
-We use [mkdocs](https://www.mkdocs.org/) to build the documentation. If you want to build the documentation locally, the following commands will prove useful:
 
-```bash
-mkdocs build       # build the documentation
-mkdocs serve       # serve the documentation on a local server
-mkdocs gh-deploy   # deploy the documentation to GitHub pages
-```
+
+## The python library - `tsdf`
+This Python library facilitates the manipulation of Time Series Data Format (TSDF) metadata and binary files, providing users with a familiar and structured interface. Leveraging the power of numpy arrays, it simplifies the process of working with TSDF data, allowing users to efficiently read, write, and manipulate both metadata and binary data. This approach enhances data management and analysis, making it a valuable tool for researchers and data scientists dealing with extensive physiological sensor data.
+
+
+The package is available in [PyPI](https://pypi.org/project/tsdf/).
 
 ## Contributing
 
@@ -54,10 +94,6 @@ We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.
 
 To ensure a welcoming and respectful community, all contributors and participants are expected to adhere to our [Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.
 
-## License
-
-This package was created by Pablo Rodríguez, Peter Kok and Vedran Kasalica. It is licensed under the terms of the Apache License 2.0 license.
-
 ## Credits
 
 - The [TSDF data format](https://arxiv.org/abs/2211.11294) was created by Kasper Claes, Valentina Ticcinelli, Reham Badawy, Yordan P. Raykov, Luc J.W. Evers, Max A. Little.