-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
226 lines (149 loc) · 7.53 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
---
title: "Standartox"
output: rmarkdown::github_document
editor_options:
chunk_output_type: console
---
[![CRAN](https://www.r-pkg.org/badges/version/standartox)](https://CRAN.R-project.org/package=standartox)
[![Downloads](https://cranlogs.r-pkg.org/badges/standartox)](https://cran.r-project.org/package=standartox)
Standartox is a database and tool facilitating the retrieval of ecotoxicological test data. It is based on the [EPA ECOTOX database](https://cfpub.epa.gov/ecotox/) as well as on data from several other chemical databases and allows users to filter and aggregate ecotoxicological test data in an easy way. It can either be accessed via <http://standartox.uni-landau.de> or this R-package [standartox](https://github.com/andschar/standartox). Ecotoxicological test data is used in environmental risk assessment to calculate effect measures such as [TU - Toxic Units](https://en.wikipedia.org/wiki/Toxic_unit) or [SSD - Species Sensitivity Distributions](https://edild.github.io/ssd/) to asses environmental toxicity of chemicals.
The project lives in two repositories:
- [standartox-build](https://github.com/andschar/standartox-build) - Compiles and serves database
- [standartox](https://github.com/andschar/standartox) - R-package
## Installation
```{r eval=FALSE}
install.packages('standartox')
# remotes::install_github('andschar/standartox') # development version
```
## Functions
Standartox consists of the two functions `stx_catalog()` and `stx_query()`. The former allows you to retrieve a catalog of possible parameters that can be used as an input for `stx_query()`. The latter fetches toxicity values from the database.
### `stx_catalog()`
The function returns a list of all possible arguments that can bes use in `stx_query()`.
```{r message=FALSE}
require(standartox)
catal = stx_catalog()
names(catal)
```
```{r eval=FALSE}
catal$endpoint # access the parameter endpoint
```
```{r echo=FALSE}
endpoint = catal$endpoint
knitr::kable(endpoint)
```
### `stx_query()`
The function allows you to retrieve filtered and aggregated toxicity data according to the parameters below.
```{r echo=FALSE}
df = data.frame(parameter = names(catal),
example = sapply(lapply(lapply(lapply(catal, `[[`, 1), head, 3), na.omit),
paste0, collapse = ', '),
row.names = NULL,
stringsAsFactors = FALSE)
df[ df$parameter == 'duration', ]$example = paste(24, 96, sep = ', ')
df = df[ df$parameter != 'meta', ]
knitr::kable(df)
```
You can type in parameters manually or subset the object returned by `stx_catalog()`:
```{r}
require(standartox)
cas = c(Copper2Sulfate = '7758-98-7',
Permethrin = '52645-53-1',
Imidacloprid = '138261-41-3')
# query
l = stx_query(cas = cas,
endpoint = 'XX50',
taxa = grep('Oncorhynchus', catal$taxa$variable, value = TRUE), # fish genus
exposure = 'aquatic',
duration = c(24, 120))
```
#### Important parameter settings
- __CAS__ (`cas =`) Can be input in the form of 7758-98-7 or 7758987
- __Endpoints__ (`endpoint =`) Only one endpoint per query is allowed:
- `NOEX` summarises [No observed effect concentration/level](https://en.wikipedia.org/wiki/No-observed-adverse-effect_level) (i.e. NOEC, NOEL, NOAEL, etc.)
- `LOEX` summarises Lowest observed effects concentration (i.e. LOEC, LOEL, etc.)
- `XX50` summarises [Half maximal effective concentration](https://en.wikipedia.org/wiki/EC50) (i.e. EC50, LC50, LD50, etc.)
- If you leave a parameter empty Standartox will not filter for it
## Query result
Standartox returns a list object with five entries.
- `l$filtred` and `l$filtered_all` contain the filtered Standartox data set (the former only is a shorter and more concise version of the latter):
```{r echo=FALSE}
cols = c('cas', 'cname', 'concentration', 'concentration_unit', 'effect', 'endpoint')
knitr::kable(l$filtered[1:3, .SD, .SDcols = cols ])
```
- `l$aggregated` contains the several aggregates of the Standartox data:
- `cname`, `cas` - chemical identifiers
- `min` - Minimum
- `tax_min` - Most sensitive taxon
- __`gmn`__ - __Geometric mean__
- `amn` - Arithmetic mean
- `sd` - Standard Deviation of the arithmetic mean
- `max` - Maximum
- `tax_max` - Most insensitive taxon
- `n` - Number of distinct taxa used for the aggregation
- `tax_all` - Concatenated string of all taxa used for the aggregation
```{r echo=FALSE}
knitr::kable(head(l$aggregated[ , .SD, .SDcols = c('cname', 'cas', 'min', 'tax_min', 'gmn', 'max')], 3))
```
- `l$meta` contains meta information on the request:
```{r echo=FALSE}
knitr::kable(l$meta)
```
## Example: _Oncorhynchus_
Let's say, we want to retrieve the 20 most tested chemicals on the genus _[Oncorhynchus](https://en.wikipedia.org/wiki/Oncorhynchus)_. We allow for test durations between 48 and 120 hours and want the tests restricted to active ingredients only. Since we are only interested in the half maximal effective concentration, we choose XX50 as our endpoint. As an aggregation method we choose the geometric mean.
```{r warning=FALSE}
require(standartox)
l2 = stx_query(concentration_type = 'active ingredient',
endpoint = 'XX50',
taxa = grep('Oncorhynchus', catal$taxa$variable, value = TRUE), # fish genus
duration = c(48, 120))
```
We subset the retrieved data to the 20 most tested chemicals and plot the result.
```{r warning=FALSE, message=FALSE}
require(data.table)
dat = merge(l2$filtered, l2$aggregated, by = c('cas', 'cname'))
cas20 = l2$aggregated[ order(-n), cas ][1:20]
dat = dat[ cas %in% cas20 ]
```
```{r warning=FALSE, message=FALSE, fig.width=9, fig.height=6, dpi=300}
require(ggplot2)
ggplot(dat, aes(y = reorder(cname, -gmn))) +
geom_point(aes(x = concentration, col = 'All values'),
pch = 1, alpha = 0.3) +
geom_point(aes(x = gmn, col = 'Standartox value\n(Geometric mean)'),
size = 3) +
scale_x_log10(breaks = c(0.01, 0.1, 1, 10, 100, 1000, 10000),
labels = c(0.01, 0.1, 1, 10, 100, 1000, 10000)) +
scale_color_viridis_d(name = '') +
labs(title = 'Oncorhynchus EC50 values',
subtitle = '20 most tested chemicals',
x = 'Concentration (ppb)') +
theme_minimal() +
theme(axis.title.y = element_blank())
```
## Usage
We ask you to use the API service thoughtfully, which means to run the `stx_query()` only once and to re-run it only when parameters change or you want to query new versions. Here is an example of how to easily store the queried data locally from within R.
```{r eval=FALSE}
run = FALSE # set to TRUE for the first run
if (run) {
l2 = stx_query(concentration_type = 'active ingredient',
endpoint = 'XX50',
taxa = grep('Oncorhynchus', catal$taxa$variable, value = TRUE), # fish genus
duration = c(48, 120))
saveRDS(l2, file.path('path/to/directory', 'data.rds'))
} else {
l2 = readRDS(file.path('path/to/directory', 'data.rds'))
}
# put rest of the script here
# ...
```
## Article
The article on Standartox is published [here](https://www.mdpi.com/2306-5729/5/2/46).
## Information
### Contributors
- [Andreas Scharmüller](https://andschar.github.io)
### Want to contribute?
Check out our [contribution guide here](https://github.com/andschar/standartox/blob/master/CONTRIBUTING.md).
### Meta
- Please report any [issues, bugs or feature requests](https://github.com/andschar/standartox/issues)
- License: MIT
- Get citation information for the standartox package in R doing `citation(package = 'standartox')`