Skip to content

Commit

Permalink
Changements dans la vignette avec insertion d'un paragraphe présentan…
Browse files Browse the repository at this point in the history
…t bmdfilter()
  • Loading branch information
Marie-Laure DELIGNETTE-MULLER committed Nov 28, 2023
1 parent a02e36d commit 608a575
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 25 deletions.
6 changes: 3 additions & 3 deletions man/bmdfilter.Rd
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
\name{bmdfilter}
\alias{bmdfilter}
\title{Filtering of items in DRomics workflow output}
\title{Filtering BMDs according to estimation quality}

\description{
Filtering of items (e.g. transcripts, metabolites, ...)
to be removed from DRomics workflow output before further biological annotation and interpretation.
Filtering BMDs in DRomics workflow output according to estimation quality,
to retain the best estimated BMDs for subsequent biological annotation and interpretation.
}

\usage{
Expand Down
2 changes: 1 addition & 1 deletion share/todolist.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
1. [X] Retravailler les test_that (ML)
1. [X] Change the default for log scale in each plot (fit or BMD) AND add a warning (ML and A pour mise en place warning). Dans shiny appels à bmdplot(BMD_log_transfo = TRUE), bmdplotwithgradient(BMD_log_transfo = TRUE), sensitivityplot(BMD_log_transfo = TRUE), plot.drcfit(dose_log_transfo = TRUE), plotfit2pdf(dose_log_transfo = TRUE), targetplot(dose_log_transfo = TRUE), et nouvel argument BMD_log_transfo par défaut à TRUE à
gérer dans les appels à plot.bmdcalc() (et plot.bmdboot() mais pas dans shiny).
PAR CONTRE EN ATTENTE car plus délicat à gérer - faudrait mettre une val par défaut à xmin : curvesplot(dose_log_transfo = TRUE)
1. [ ] PAR CONTRE EN ATTENTE car plus délicat à gérer - faudrait mettre une val par défaut à xmin : pour pouvoir mettre l'échelle en log par défaut aussi pour curvesplot(dose_log_transfo = TRUE)
1. [X] Mettre l'option scaling par défaut à TRUE dans le package (comme c'est déjà fait dans l'appli shiny) et l'indiquer dans la vignette (ML - still to include in the vignette)
1. [X] Mettre un message à l'ouverture du package (startupmessage) pour indiquer les options par défaut changées (ML - sent to Aurélie - A)
1. [X] Retravailler les xlab et ylab notamment mettre scaled signal ou scaled y si scaling dans curvesplot et dans bmdplotwithgradient dans légende scaled_signal (ML)
Expand Down
92 changes: 71 additions & 21 deletions vignettes/DRomics_vignette.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ DRomics is a freely available tool for dose-response (or concentration-response)
After a first step which consists in **importing**, **checking** and if needed **normalizing/transforming** the data ([step 1](#step1)), the aim of the proposed workflow is to **select monotonic and/or biphasic significantly responsive items** (e.g. probes, contigs, metabolites) ([step 2](#step2)), to **choose the best-fit model** among a predefined family of monotonic and biphasic models to describe the response of each selected item ([step 3](#step3)), and to **derive a benchmark dose** or concentration from each fitted curve ([step 4](#step4)). Those steps can be performed in R using DRomics functions, or using the **shiny application named DRomics-shiny**.

In the available version, DRomics supports **single-channel microarray data** (in log2 scale), **RNAseq data** (in raw counts) and
other **continuous omics data** (in log scale), such as metabolomics data calculated from AUC values (area under the curve), or proteomics data when expressed as protein abundance from peak intensity values. Proteomics data expressed in spectral counts can be analyzed as RNAseq data using raw counts. In order to link responses across biological levels based on a common method, DRomics also handles **continuous apical data** as long as they meet the use conditions of **least squares regression** (homoscedastic Gaussian regression, see the [section on least squares](#leastsquares) for a reminder if needed).
other **continuous omics data** (in log scale), such as metabolomics data calculated from AUC values (area under the curve), or proteomics data when expressed as protein abundance from peak intensity values. Proteomics data expressed in spectral counts could be analyzed as RNAseq data using raw counts
after checking the validity of the assumptions made in the treatment of RNAseq data. In order to link responses across biological levels based on a common method, DRomics also handles **continuous apical data** as long as they meet the use conditions of **least squares regression** (homoscedastic Gaussian regression, see the [section on least squares](#leastsquares) for a reminder if needed).

As built in the environmental risk assessment context where omics data are more often collected on non-sequenced species or species communities, DRomics does not provide an annotation pipeline.
The **annotation of items selected by DRomics** may be complex in this context, and **must be done outside DRomics** using databases such as KEGG or Gene Ontology.
Expand Down Expand Up @@ -201,7 +202,7 @@ on the is condition).

**Three types of omics data** may be imported in DRomics using the following functions:

+ **RNAseqdata()** should be used to import **RNAseq as counts of reads** (for details look at the example with [RNAseq data](#RNAseqexample)), but also to import proteomics data expressed in spectral counts,
+ **RNAseqdata()** should be used to import **RNAseq as counts of reads** (for details look at the example with [RNAseq data](#RNAseqexample)),
+ **microarraydata()** should be used to import **single-channel microarray data in log2 scale** (for details look at the example with [microarray data](#microarrayexample)),
+ **continuousomicdata()** should be used to import **other continuous omics data** such as metabolomics data, or proteomics data (only when expressed in intensity),..., **in a scale that enables the use of a Gaussian error model** (for details look at the example with [metabolomic omics data](#metabolomicexample)).

Expand Down Expand Up @@ -296,7 +297,8 @@ The plot of the output shows the distribution of the signal over all the metabol

The deprecated metabolomicdata() function was renamed continuousomicdata() in the recent versions
of the package (while keeping the first name available)
to **offer its use to other continuous omic data** such as **proteomics data** or **RT-qPCR data**.
to **offer its use to other continuous omic data** such as **proteomics data** (when expressed
in intensity) or **RT-qPCR data**.
As for metabolomic data, the **pre-treatment** of other continuous omic data must be done **before importation**,
and **data must be imported in a scale that enables the use of a
Gaussian error model** as this strong hypothesis is required both for selection of items and for dose-response modeling.
Expand Down Expand Up @@ -788,7 +790,8 @@ plot(r, BMDtype = "zSD", plottype = "ecdf") + theme_bw()
Different alternative plots are proposed (see ?bmdcalc for details)
that can be obtained using the argument plottype to choose the type of plot
("ecdf", "hist" or "density") and the argument by to split the
plot for example by "trend".
plot for example by "trend". You can also use the bmdplot() function
to make an ECDF plot of BMDs and personalize it (see ?bmdplot for details).

<!-- Below is an example -->
<!-- of a density plot of BMD-zSD split by trend of dose-response -->
Expand Down Expand Up @@ -869,10 +872,47 @@ gives an ECDF plot of the chosen BMD with the confidence interval
of each BMD (see ?bmdcalc for examples). By default BMDs with an infinite
confidence interval bound are not plotted.


<!-- ```{r} -->
<!-- plot(b, BMDtype = "zSD", by = "trend") -->
<!-- ``` -->
### Filtering BMDs according to estimation quality {#bmdfilter}

Using the bmdfilter() function, it is possibe to use one of the
three filters proposed to retain
only the items associated to the best estimated BMD values.
By default are retained only the items for which the BMD and its
confidence interval are defined (using `"CIdefined"`)
(so excluding items for which the bootstrap procedure failed).
One can be even more restrictive by
retaining items only if the BMD confidence interval is within the range of
tested/observed doses (using `"CIfinite"`), or less restrictive
(using `"BMDdefined"`) requiring that the BMD
point estimate only must be defined within the range of tested/observed doses
(let us recall that in the `bmdcalc()` output,
if it is not the case the BMD is coded `NA`).

Below is an example of application of the different filters based on BMD-xfold values,
chosen just to better illustrate the way filters work, as there far more bad BMD-xfold estimations
than bad BMD-zSD estimations.

```{r, fig.height = 3}
# Plot of BMDs with no filtering
subres <- bmdfilter(b$res, BMDfilter = "none")
bmdplot(subres, BMDtype = "xfold", point.size = 2, point.alpha = 0.4,
add.CI = TRUE, line.size = 0.4) + theme_bw()
# Plot of items with defined BMD point estimate
subres <- bmdfilter(b$res, BMDtype = "xfold", BMDfilter = "definedBMD")
bmdplot(subres, BMDtype = "xfold", point.size = 2, point.alpha = 0.4,
add.CI = TRUE, line.size = 0.4) + theme_bw()
# Plot of items with defined BMD point estimate and CI bounds
subres <- bmdfilter(b$res, BMDtype = "xfold", BMDfilter = "definedCI")
bmdplot(subres, BMDtype = "xfold", point.size = 2, point.alpha = 0.4,
add.CI = TRUE, line.size = 0.4) + theme_bw()
# Plot of items with finite BMD point estimate and CI bounds
subres <- bmdfilter(b$res, BMDtype = "xfold", BMDfilter = "finiteCI")
bmdplot(subres, BMDtype = "xfold", point.size = 2, point.alpha = 0.4,
add.CI = TRUE, line.size = 0.4) + theme_bw()
```

### Plot of fitted curves with BMD values and confidence intervals

Expand Down Expand Up @@ -900,21 +940,27 @@ figure and to add The use of the curvesplot() function is more extensively descr
next parts and in the corresponding help page. By default in this plot the curves are scaled
to focus on the shape of the dose-response and not on their amplitude (add `scaling = FALSE`
to see the curves without scaling).

In the following plot we also added vertical lines
corresponding to tested doses on the plot and add transparency to visualize the density of curves
when shapes are similar (especially the case for linear shapes).

The use of the package plotly to make such a plot interactive can be interesting
for example to get the identifiant of each curve or to choose what group of curves to
eliminate or to focus on (you can try on the interactive version of the previous figure below).


```{r}
tested.doses <- unique(f$omicdata$dose)
g <- curvesplot(r$res, addBMD = TRUE, xmax = max(tested.doses), colorby = "trend",
g <- curvesplot(r$res, addBMD = TRUE, xmax = max(tested.doses), colorby = "trend",
line.size = 0.8, line.alpha = 0.3, point.size = 2, point.alpha = 0.6) +
geom_vline(xintercept = tested.doses, linetype = 2) + theme_bw()
if (require(plotly))
print(g)
```

The use of the package plotly to make such a plot interactive can be interesting
for example to get the identifiant of each curve or to choose what group of curves to
eliminate or to focus on. You can try the following code to get
an interactive version of the previous figure.


```{r, eval = FALSE}
if (require(plotly))
{
ggplotly(g)
}
Expand All @@ -924,7 +970,9 @@ tested.doses <- unique(f$omicdata$dose)

### Description of the outputs of the complete DRomics workflow {#outputs}

The **output of the complete DRomics workflow**, given in `b$res` with `b` being the output of bmdboot()
The **output of the complete DRomics workflow**, given in `b$res` with `b` being the output of bmdboot(),
or the output of `bmdfilter(b$res)` (see [previous section](#bmdfilter) for description of
BMD filtering options)
is a **data frame** reporting the **results of the fit and BMD computation on each selected item** sorted in the ascending order of the adjusted p-values
returned by the item selection step.

Expand Down Expand Up @@ -1267,10 +1315,12 @@ prop.table() to the table of frequencies `t.pathways`.
<!-- par(original.par) -->
<!-- ``` -->

Here the ggplot2 grammar is used to plot the ECDF of BMD_zSD using different colors for the different molecular levels.
Here the ggplot2 grammar is used to plot the ECDF of BMD_zSD using different colors for the different molecular levels, after removing the redundant lines corresponding to items corresponding to more
than one pathway.

```{r}
ggplot(extendedres, aes(x = BMD.zSD, color = explevel)) +
unique.items <- unique(extendedres$id)
ggplot(extendedres[match(unique.items, extendedres$id), ], aes(x = BMD.zSD, color = explevel)) +
stat_ecdf(geom = "step") + ylab("ECDF") + theme_bw()
```

Expand Down Expand Up @@ -1300,9 +1350,9 @@ by group (here by KEGG pathway class) as below.
as in [the previous section presenting bmdplot()](#bmdplot)).

```{r}
# BMD ECDF plot split by molecular level
bmdplot(extendedres, BMDtype = "zSD",
facetby = "explevel") + theme_bw()
# BMD ECDF plot split by molecular level, after removing items redundancy
bmdplot(extendedres[match(unique.items, extendedres$id), ], BMDtype = "zSD",
facetby = "explevel", point.alpha = 0.4) + theme_bw()
# BMD ECDF plot colored by molecular level and split by path class
bmdplot(extendedres, BMDtype = "zSD",
Expand Down

0 comments on commit 608a575

Please sign in to comment.