Skip to content

Commit

Permalink
Merge pull request #353 from matthiaskoenig/develop
Browse files Browse the repository at this point in the history
Changes for v0.5.2
  • Loading branch information
matthiaskoenig authored May 7, 2019
2 parents 98c3fff + 0e641f0 commit c183053
Show file tree
Hide file tree
Showing 59 changed files with 1,017 additions and 495 deletions.
15 changes: 9 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@ services:
python:
- "3.6"
env:
PKDB_DOCKER_COMPOSE_YAML=docker-compose-develop.yml
PKDB_DJANGO_CONFIGURATION=local
PKDB_API_BASE=http://0.0.0.0:8000
PKDB_SECRET_KEY=cgasj6yjpkagcgasj6yjpkagcgasj6yjpkag
PKDB_DEFAULT_PASSWORD=pkdb

PKDB_API_BASE="http://0.0.0.0:8000"

PKDB_SECRET_KEY="cgasj6yjpkagcgasj6yjpkagcgasj6yjpkag"
PKDB_DEFAULT_PASSWORD="pkdb"

PKDB_DB_NAME=postgres
PKDB_DB_PASSWORD=postgres
Expand All @@ -21,12 +24,12 @@ env:
PKDB_EMAIL_HOST_USER=
PKDB_EMAIL_HOST_PASSWORD=

before_script:
- docker-compose build

before_script:
- ./docker-purge.sh

script:
- docker-compose run --rm backend bash -c "./manage.py test"
- docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend bash -c "/usr/local/bin/python manage.py test pkdb_app.tests"

notifications:
email: false
Expand Down
286 changes: 286 additions & 0 deletions CURATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
# Curation of pharmacokinetics studies for PK-DB

This document provides resources and information on how to curate data and information
from pharmacokinetics studies for the PK-DB database.

Available choices for important fields are available from the PK-DB frontend
https://develop.pk-db.com/#/curation

If things are unclear during curation or you find incorrect or missing information in
this document please open an issue at
https://github.com/matthiaskoenig/pkdb/issues
with the label `data curation`.


## Interactive curation

All information of a single study is stored in one folder. The complete information can be uploaded
and iteratively curated using the provided `watch_study` script. The script uploads
the latest data on file changes therefore allowing rapid iteration of curation and
validation. Information on how to setup the `watch_study` script is provided
in https://github.com/matthiaskoenig/pkdb_data

```
# activate virtualenv with watch_study script
workon pkdb_data
# export environment variables for backend
(pkdb_data) set -a && source .env.local
# run the watch script
(pkdb_data) watch_study -s $STUDYFOLDER
```

## 1. PDF, Reference, Figures & Tables

For upload a certain folder structure and organisation is expected of the `$STUDYFOLDER`.
The first step is to create the folder and the basic files in the folder

- folder name is `STUDYNAME`, e.g., `Albert1974`
- folder contains the pdf as `STUDYNAME.pdf`, e.g., `Albert1974.pdf`
- folder contains additional files associated with study, i.e.,
- tables, named `STUDYNAME_Tab[1-9]*.png`, e.g., `Albert1974_Tab1.png`
- figures, named `STUDYNAME_Fig[1-9]*.png`, e.g., `Albert1974_Fig2.png`
- excel file, named `STUDYNAME.xlsx`, e.g., `Albert1974.xlsx`

In addition the folder can contain data files,
named `STUDYNAME_Tab[1-9]*.tsv` or `STUDYNAME_Fig[1-9]*.tsv`
The information is the excel file is the primary data source, i.e., the `*.tsv`
information is redundant and not guaranteed to be maintained.

Information about the study is stored in the `study.json` which we will create
in the following steps. Information about the reference for the study is stored
in the `reference.json` (this file is created automatically and should not be altered).


## 2. Initial study information (`study.json`)

For modifying JSON files best use the `Atom` editor with the plugin `pretty-json` and activate the following
settings `Notify on Parse Error` and `Prettify on Save JSON`.

If a `study.json` is already existing we will modify it in the following, if not
now is the time to create a new `study.json` in the `$STUDYFOLDER`. The following template JSON
contains all the relevant information

```json
{
"sid": 123456789,
"pkdb_version": 1.0,
"name": "Author2007",
"reference": 123456789,
"licence": "open || closed",
"access": "public || private",
"creator": "mkoenig",
"curators": [
"mkoenig",
"janekg"
],
"collaborators": [],
"substances": [],
"keywords": [],
"descriptions": [],
"comments": [],
"groupset": {
"descriptions": [],
"comments": [],
"groups": []
},
"individualset": {
"descriptions": [],
"comments": [],
"individuals": []
},
"interventionset": {
"descriptions": [],
"comments": [],
"interventions": []
},
"outputset": {
"descriptions": [],
"comments": [],
"outputs": [],
"timecourses": []
}
}
```
* Fill in basic information for study, i.e., the `name` field with the `$STUDYNAME`, the `creator` and `curator` and `collaborator` fields with existing users
(creator is a single user, whereas curators is a list of users),
the `sid` and `reference` field with the `PubmedId` of the study.
* Substances which are used in the study should be listed in the `substances`. Substances must be existing substances which can be looked up at https://develop.pk-db.com/#/curation
* `keywords` relevant for the study should be mentioned in the `keywords` list. Keywords must be existing keywords which can be looked up at https://develop.pk-db.com/#/curation
* The `reference` field is optional. If no pubmed entry exist for publication a `reference.json` should be build manually (please ask what to do in such a case).

After this initial information is created in the `study_json` we can start running the `watch_study` script.
```
(pkdb_data) watch_study -s $STUDYFOLDER
```

### Comments and descriptions
The study and all sets allow to store additional annotions either in the form of `descriptions` or `comments`.
Descriptions are quotations from the publication to substantiate and support the curated data.
`comments` provide the possibility to store information from individual curators. Examples of comments are stating incorrect units, missing data or strange conversions of data.


## 3. Curation of groups
The next step is the curation of the `group` information, i.e., which groups where studied. The information is stored in the `groupset` of the following form.
Retrieve group information from the publication and copy it in the description of the groupset. The top group containing all subjects must be called `all`.

The main steps for defining groups is to set the `name` and `count` of the group (with `name` being `all` for the top-level group, or an arbitrary descriptive name
otherwise; the `count` is the number of subjects in the group. In addition to this core information a group is characterized by `characteristica`.
An overview over the available `characteristica` and possible choices is available from https://develop.pk-db.com/#/curation
```json
{
"groupset": {
"description": [
"The subjects were six healthy volunteers, three males and three females, aged 21.0 ± 0.9 years (range 20 to 22 years) and weighing 63 ± 11 kg (range 50 to 76 kg). All were nonsmokers. Subjects abstained from all methylxanthine containing foods and beverages during the entire period of the study. Compliance with this requirement was assessed by questioning at each presentation for blood sampling or urine delivery."
],
"comments": [],
"groups": [
{
"count": 20,
"name": "all",
"characteristica": [
{
"category": "species",
"choice": "homo sapiens"
},
{
"category": "healthy",
"choice": "Y"
},
{
"category": "overnight fast",
"choice": "Y"
},
{
"category": "fasted",
"min": "10",
"max": "14",
"unit": "hr"
}
]
}
]
}
}
```
* The groups are defined via `characteristica`, shared characteristica of groups can be defined via parent groups.
* The possible values of categorial are accessible via `pkdb_app/categorials.py`.
* The group names have to be unique within the groupset and should be descriptive, e.g., smoker, nonsmoker, contraceptive.
* All groups require a `name`, all groups with exception of `all` require `parent` field.
"parent": "string",

All available fields for characteristica on group are:
```json
{
"category": "categorial",
"choice": "categorial|string",
"count": "int",
"mean": "double",
"median": "double",
"min": "double",
"max": "double",
"sd": "double",
"se": "double",
"cv": "double",
"unit": "categorial"
}
```

Individuals are curated like groups with the exception that individuals must belong
to a given group, i.e., the `group` attribute must be set. Individuals are most often defined via excel spreadsheets.

## 4. Curation of interventions/interventionset
```json
{
"interventionset": {
"description": "All patients and volunteers fasted overnight and, at 0800 hours, were given orally 300 mg caffeine dissolved in 150 ml water; food intake was allowed 3 h after administration of caffeine.",
"interventions": [
{
"name": "glciv",
"substance": "glucose",
"route": "iv",
"form": "solution",
"application": "single dose",
"category": "dosing",
"value": "0.5",
"unit": "g/kg",
"time": "0",
"time_unit": "hr"
}
]
}
}
```
All available fields for intervention and interventionset are:
```json
{

"name": "string",
"category": "categorial",

"substance": "categorial (substance)",
"route": "categorial {oral, iv}",
"application": "categorial {'single dose', 'multiple doses', 'continuous injection'}",
"form": "categorial {'tablete', 'capsule', ...}",
"time": "double||double||double ...",
"time_unit": "categorial",

"choice": "categorial|string",
"value": "double",
"mean": "double",
"median": "double",
"min": "double",
"max": "double",
"sd": "double",
"se": "double",
"cv": "double",
"unit": "categorial"
}
```
* TODO: document after next iteration

## 5. Curation of output/outputset
Data from Figures should be digized using plot digitizer. See guidelines in

- Use Excel (LibreOffice/OpenOffice) spreadsheets to store data, with all digitized data is stored in excel spreadsheets
- change language settings to use US numbers and formats, i.e. ‘.’ separator). Always use points (‘.’) as number separator, never comma (‘,’), i.e. 1.234 instead of 1,234.
- PlotDigitizer to digitize figures (https://sourceforge.net/projects/plotdigitizer/)

```json

{
"source": "Akinyinka2000_Tab3.csv",
"format": "TSV",
"subset": "substance==paraxanthine",
"figure": "Akinyinka2000_Tab3.png",
"group": "healthy subjects",
"interventions": [
"Dcaf"
],
"substance": "paraxanthine",
"tissue": "col==tissue",
"pktype": "cmax || tmax || auc_inf || thalf",
"mean": "col==cmax || col==tmax || col==aucinf || col==thalf",
"sd": "col==cmax_sd || col==tmax_sd || col==aucinf_sd || col==thalf_sd",
"unit": "\u00b5g/ml || hr || \u00b5g*hr/ml || hr"
}
```


## Units for curation
Pint units:
https://github.com/hgrecco/pint/blob/master/pint/default_en_0.6.txt


## Open issues
- individual set
- time course data
- column data

## Frequently asked questions (FAQ)
# How to encode multiple substances which are given (e.g., caffeine and acetaminophen are given)?
- encode as individual interventions of caffeine and acetaminophen, i.e., single interventions
with the respective doses. In the outputs a list of intervention is provided, i.e., in this example the interventions for caffeine and acetaminophen


2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1407035.svg)](https://doi.org/10.5281/zenodo.1406979)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1406979.svg)](https://doi.org/10.5281/zenodo.1406979)
[![Build Status](https://travis-ci.org/matthiaskoenig/pkdb.svg?branch=develop)](https://travis-ci.org/matthiaskoenig/pkdb)

# PKDB - Pharmacokinetics database
Expand Down
2 changes: 1 addition & 1 deletion backend/pkdb_app/_version.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""
Definition of version string.
"""
__version__ = "0.5.1"
__version__ = "0.5.2"
13 changes: 6 additions & 7 deletions backend/pkdb_app/categorials/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,8 @@ def is_valid_unit(self, unit):

def validate_unit(self, unit):
if not self.is_valid_unit(unit):
msg = f"[{unit}] with dimension [{self.unit_dimension(unit)}] is not allowed. " \
f"For units in the category [{self.key}]"
raise ValueError({"unit": msg,"allowed dimensions":self.valid_dimensions_str, "norm_units":self.n_units})
msg = f"For category `{self.key}` the unit [{unit}] with dimension {self.unit_dimension(unit)} is not allowed."
raise ValueError({"unit": msg, "Only units with the following dimensions are allowed:": self.valid_dimensions_str, "Units are allowed which can be converted to the following normalized units:": self.n_units})

def is_valid_time_unit(self, time_unit):
return self.p_unit(time_unit).dimensionality == '[time]'
Expand Down Expand Up @@ -143,7 +142,7 @@ def is_valid_choice(self, choice):
return choice in self.choices.values_list("name",flat=True)

def choices_list(self):
return self.choices.values_list("name",flat=True)
return self.choices.values_list("name", flat=True)

def _asdict(self):
return {
Expand All @@ -158,11 +157,11 @@ def validate_choice(self, choice):
if choice:
if (self.dtype == CATEGORIAL_TYPE) or (self.dtype == BOOLEAN_TYPE):
if not self.is_valid_choice(choice):
msg = f"{choice} is not part of {self.choices} for {self.key}"
msg = f"The choice `{choice}` is not a valid choice for category `{self.key}`. Allowed choices are: `{list(self.choices_list())}`."
raise ValueError({"choice": msg})
else:
msg = f"for category: <{self.category}> no choices are allowed. " \
f"If you are trying to insert a numerical value, use keyword value, mean or median"
msg = f"The field `choice` is not allowed for category `{self.category}`." \
f"For numerical values the fields `value`, `mean` or `median` are used."
raise ValueError({"choice": msg})

class Meta:
Expand Down
4 changes: 0 additions & 4 deletions backend/pkdb_app/categorials/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
Serializers for interventions.
"""



# ----------------------------------
# Interventions
# ----------------------------------
Expand Down Expand Up @@ -62,7 +60,6 @@ def create(self, validated_data):
return instance

def update(self, instance, validated_data):
print(validated_data)
units_data = validated_data.pop("units", [])
choices_data = validated_data.pop("choices", [])

Expand All @@ -77,7 +74,6 @@ def update(self, instance, validated_data):
update_or_create_multiple(instance, units_data, "units")

if choices_data:
print(choices_data)
update_or_create_multiple(instance, choices_data, "choices")

instance.save()
Expand Down
Loading

0 comments on commit c183053

Please sign in to comment.