Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Readme #71

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 63 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

# tsdf
# Welcome to the TSDF (Time Series Data Format)

| Badges | |
|:----:|----|
Expand All @@ -9,42 +9,82 @@
| **License** | [![GitHub license](https://img.shields.io/github/license/biomarkersParkinson/tsdf)](https://github.com/biomarkersparkinson/tsdf/blob/main/LICENSE) |
| **Fairness** | [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/8083/badge)](https://www.bestpractices.dev/projects/8083) |

A package to work with TSDF data in Python. This implementation is based on the the TSDF format specification, which can be found in this [preprint](https://arxiv.org/abs/2211.11294).

## What is TSDF data?

TSDF provides a unified, user-friendly format for both numerical sensor data and metadata, utilizing raw binary data and JSON-format text files for measurements/timestamps and metadata, respectively. It defines essential metadata fields to enhance data interpretability and exchangeability, aiming to bolster scientific reproducibility in studies reliant on digital biosensor data as a critical evidence base across various disease domains.

## How does the TSDF library work?

Detailed documentation and examples can be found in the [documentation](https://biomarkersparkinson.github.io/tsdf/).

## Example: TSDF Metadata

This example demonstrates a TSDF metadata JSON file, showcasing the structured format used to easily interpret and read the corresponding binary data. For more intricate examples and detailed specifications, the paper serves as a comprehensive reference.

```json
{
"study_id": "voicedata",
"subject_id": "recruit089",
"device_id": "audiotechnica02",
"endianness": "little",
"metadata_version": "0.1",
"start_iso8601": "2016-08-09T10:31:00.000+00:00",
"end_iso8601": "2016-08-10T10:31:30.000+00:00",
"sampling_rate": 44100,
"rows": 1323000,
"channels": [
"left",
"right"
],
"units": [
"unitless",
"unitless"
],
"compression": "none",
"data_type": "int",
"bits": 16,
"file_name": "audio_voice_089.raw"
}
```
**Explanation:**

- `study_id`: Identifies the study as "voicedata".

A package ([documentation](https://biomarkersparkinson.github.io/tsdf/)) to load TSDF data ([specification](https://arxiv.org/abs/2211.11294)) into Python.
- `subject_id`: Specifies the subject as "recruit089".

## Installation
- `device_id`: Indicates the device used as "audiotechnica02".

### Using `pip`
- `endianness`: Specifies the byte order as "little".

The package is available in PyPi and requires [Python 3.9](https://www.python.org/downloads/) or higher. It can be installed using:
- `metadata_version`: Denotes the metadata version as "0.1".

```bash
$ pip install tsdf
```
- `start_iso8601` and `end_iso8601`: Define the start and end timestamps of data collection in ISO 8601 format.

## Usage
- `sampling_rate`: Represents the data sampling rate as 44,100 samples per second.

See our [extended tutorials](https://biomarkersparkinson.github.io/tsdf/).
- `rows`: Specifies the number of data rows as 1,323,000.

## Development
- `channels`: Lists the data channels as "left" and "right".

### Running tests
- `units`: Specifies the units for each channel as "unitless".

```bash
poetry install
poetry run pytest
```
- `compression`: Indicates that no compression has been applied to the data.

### Building the documentation
- `data_type`: Defines the data type as "int".

- `bits`: Specifies the bit length as 16.

- `file_name`: Names the binary file "audio_voice_089.raw" that contains the described data.

We use [mkdocs](https://www.mkdocs.org/) to build the documentation. If you want to build the documentation locally, the following commands will prove useful:

```bash
mkdocs build # build the documentation
mkdocs serve # serve the documentation on a local server
mkdocs gh-deploy # deploy the documentation to GitHub pages
```

## The python library - `tsdf`
This Python library facilitates the manipulation of Time Series Data Format (TSDF) metadata and binary files, providing users with a familiar and structured interface. Leveraging the power of numpy arrays, it simplifies the process of working with TSDF data, allowing users to efficiently read, write, and manipulate both metadata and binary data. This approach enhances data management and analysis, making it a valuable tool for researchers and data scientists dealing with extensive physiological sensor data.


The package is available in [PyPI](https://pypi.org/project/tsdf/).

## Contributing

Expand All @@ -54,10 +94,6 @@ We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.

To ensure a welcoming and respectful community, all contributors and participants are expected to adhere to our [Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.

## License

This package was created by Pablo Rodríguez, Peter Kok and Vedran Kasalica. It is licensed under the terms of the Apache License 2.0 license.

## Credits

- The [TSDF data format](https://arxiv.org/abs/2211.11294) was created by Kasper Claes, Valentina Ticcinelli, Reham Badawy, Yordan P. Raykov, Luc J.W. Evers, Max A. Little.
Expand Down
Loading