Skip to content

Commit

Permalink
update fox et al 2022 ref
Browse files Browse the repository at this point in the history
  • Loading branch information
joethorley committed Oct 9, 2024
1 parent 3a9b015 commit b7bdd1c
Show file tree
Hide file tree
Showing 7 changed files with 21 additions and 18 deletions.
6 changes: 4 additions & 2 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,14 @@ @article{chapman_2007
year = {2007}
}

@techreport{fox_methodologies_2021,
@techreport{fox_methodologies_2022,
title = {Joint investigation into {statistical} {methodologies} underpinning the {derivation} of {toxicant} {guideline} {values} in {Australia} and {New Zealand}},
institution = {Environmetrics Australia and Australian Institute of Marine Science},
author = {Fox, DR and Fisher, R and Thorley, JL and Schwarz, C},
month = mar,
year = {2021}
year = {2022},
doi = {10.25845/fm9b-7n28},
url = {https://doi.org/10.25845/fm9b-7n28}
}

@techreport{fox_methodologies_2024,
Expand Down
2 changes: 1 addition & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ Finding the solution to this last equation is referred to as *finding the root(s
## Confidence Intervals

`ssdtools` generates confidence intervals for $\text{HC}_x$ and $\text{HP}_u$ values via bootstrapping.
By default all versions of `ssdtools` use parametric bootstrapping for non-censored data as it has better coverage than the equivalent non parametric approach used in other SSD modelling software such as `Burrlioz` [see @fox_methodologies_2021].
By default all versions of `ssdtools` use parametric bootstrapping for non-censored data as it has better coverage than the equivalent non parametric approach used in other SSD modelling software such as `Burrlioz` [see @fox_methodologies_2022].
The first two versions of `ssdtools` both calculated the model averaged CI from the weighted arithmetic mean of the CIs of the individual distributions (`weighted_arithmetic`).
Unfortunately, this approach has recently been shown to have poor coverage [@fox_methodologies_2024] and is inconsistent with the *inversion principle*.

Expand Down
2 changes: 1 addition & 1 deletion vignettes/additional-technical-details.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Here we report on a series of simulation studies designed to inform a final deci
We used the example datasets in the [`ssddata`](https://github.com/open-AIMS/ssddata) package in R [@ssddata] to undertake a simulation study to examine bias, coverage and confidence interval (CI) widths using the recommended default set of six distributions (lognormal, log-Gumbel, log-logistic, gamma, Weibull, and the lognormal-log-normal mixture), with model averaged estimates obtained using the multi-method, and confidence intervals estimated using the recommended weighted sample bootstrap method (see @fox_methodologies_2024).
A total of 20 unique datasets were extracted from ssddata and used to define the parameters for the simulation study as follows:

1. Each dataset was extracted from `ssddata` and fit using the default distribution set as recommended in [@fox_methodologies_2021] and [@fox_methodologies_2024].
1. Each dataset was extracted from `ssddata` and fit using the default distribution set as recommended in [@fox_methodologies_2022] and [@fox_methodologies_2024].
2. Of the six default distributions, the parameters for the distribution having the highest weight for each dataset was used to generate new random datasets of varying values of N, including (but not limited to): 5 (current ANZG minimum), 6 (current BC minimum), and 8 (current ANZG preferred).
3. For each randomly generated dataset, `ssdtools` was used to re-fit the data, and model averaged estimates were obtained using the multi-method, with upper and lower confidence limits (CLs) produced using the recommended weighted sample method (see [Confidence Intervals for Hazard Concentrations](https://bcgov.github.io/ssdtools/articles/confidence-intervals.html) vignette).

Expand Down
4 changes: 2 additions & 2 deletions vignettes/articles/confidence-intervals.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ load("confidence_intervals.RData")

Bootstrapping is a resampling technique used to obtain confidence intervals (CIs) for summary statistics.
The team have explored the use of alternative methods for obtaining the CIs of $HC_x$ estimates.
This included using the closed-form expression for the variance-covariance matrix of the parameters of the Burr III distribution, coupled with the delta-method, as well as an alternative bootstrap method for the inverse Pareto distribution based on statistical properties of the parameters [@fox_methodologies_2021].
This included using the closed-form expression for the variance-covariance matrix of the parameters of the Burr III distribution, coupled with the delta-method, as well as an alternative bootstrap method for the inverse Pareto distribution based on statistical properties of the parameters [@fox_methodologies_2022].
In both cases, it appeared that these methods can give results similar to other traditional bootstrapping approaches in much less time, and are therefore potentially worth further investigation.
However, implementation of such methods across all the distributions in `ssdtools` would be a substantial undertaking.

Expand All @@ -43,7 +43,7 @@ Instead of resampling the data, parametric bootstrapping draws a random a set of
Upper and lower 95% bounds are again calculated as the lower 0.025th and upper 0.975th quantiles of the resulting $HC_x$ estimates across all the bootstrap samples (again, typically > 1,000).
This approach attempts to capture the uncertainty in the data for a sample size from a given distribution, but it assumes no uncertainty in that original fit.

Using simulation studies the ssdtools team examined bias and compared the resulting coverage of the parametric and non-parametric bootstrapping methods [@fox_methodologies_2021].
Using simulation studies the ssdtools team examined bias and compared the resulting coverage of the parametric and non-parametric bootstrapping methods [@fox_methodologies_2022].
They found that coverage was better using the parametric bootstrapping method, and this has been retained as the default bootstrapping method in the update to ssdtools although non-parametric bootstrapping is currently the only method available for censored data.

## Bootstrapping model-averaged SSDs
Expand Down
14 changes: 7 additions & 7 deletions vignettes/distributions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@ They were adopted as the default set of three distributions in early updates of
All three distributions show good convergence properties and are retained as part of the default model set in version 2.0 of `ssdtools`.

In addition to the log-normal, log-logistic and Gamma distributions, the original version of `ssdtools` as developed by [Thorley and Schwarz (2018)](https://joss.theoj.org/papers/10.21105/joss.01082.pdf) also included three additional distributions in the candidate model set, including the log-gumbel, Gompertz and Weibull distributions.
Of these, the log-Gumbel (otherwise known as the inverse Weibull, see below) shows relatively good convergence [see Figure 32, @fox_methodologies_2021], and is also one of the limiting distributions of the Burr Type 3 distribution implemented in `ssdtools`, and has been retained in the default model set.
The Gompertz and Weibull distributions, however can exhibit unstable behaviour, sometimes showing poor convergence, and therefore been excluded from the default set [see Figure 32, @fox_methodologies_2021]
Of these, the log-Gumbel (otherwise known as the inverse Weibull, see below) shows relatively good convergence [see Figure 32, @fox_methodologies_2022], and is also one of the limiting distributions of the Burr Type 3 distribution implemented in `ssdtools`, and has been retained in the default model set.
The Gompertz and Weibull distributions, however can exhibit unstable behaviour, sometimes showing poor convergence, and therefore been excluded from the default set [see Figure 32, @fox_methodologies_2022]

### Burr III distribution

Expand Down Expand Up @@ -130,7 +130,7 @@ Similarly, if c is greater than 80 an (American) Pareto distribution is fitted.
ensure numerical stability.

Since the Burr type III, inverse Pareto and inverse Weibull (log Gumbel) distributions are used by the Burrlioz software, these have been implemented in `ssdtools`.
However, we have found there are stability issues with both the Burr type III, as well as the inverse Pareto distributions, which currently precludes their inclusion in the default model set (see @fox_methodologies_2021, and below for more details).
However, we have found there are stability issues with both the Burr type III, as well as the inverse Pareto distributions, which currently precludes their inclusion in the default model set (see @fox_methodologies_2022, and below for more details).

### Bimodal distributions

Expand All @@ -142,7 +142,7 @@ This is a consequence of the high penalty in AICc associated with the increased
The TMB version of `ssdtools` now includes the option of fitting two mixture distributions, individually or as part of a model average set.
These can be fitted using `ssdtools` by supplying the strings "llogis_llogis" and/or "lnorm_lnorm" to the *dists* argument in the *ssd_fit_dists* call.

The underlying code for these mixtures has three components: the likelihood function required for TMB; exported R functions to allow the usual methods for a distribution to be called (p, q and r); and a set of supporting R functions (see @fox_methodologies_2021 Appendix D for more details).
The underlying code for these mixtures has three components: the likelihood function required for TMB; exported R functions to allow the usual methods for a distribution to be called (p, q and r); and a set of supporting R functions (see @fox_methodologies_2022 Appendix D for more details).
Both mixtures have five parameters - two parameters for each of the component distributions and a mixing parameter (pmix) that defines the weighting of the two distributions in the ‘mixture.’

```{r echo=FALSE,fig.align='center',fig.width=9,fig.height=5, fig.cap="Sample lognormal lognormal mixture probability density (A) and cumulative probability (B) functions."}
Expand Down Expand Up @@ -203,7 +203,7 @@ However, the default set should be underpinned by a guiding principle of parsimo
Further, the default set should result in model-averaged estimates of *HCx* values that: 1) minimise bias; 2) have actual coverages of confidence intervals that are close to the nominal level of confidence; 3) estimated *HCx* and confidence intervals of *HCx* are robust to small changes in the data; and 4) represent a positively continuous distribution that has both right and left tails.

The `ssdtools` development team has undertaken extensive simulation studies, as well as some detailed technical examinations of the various candidate distributions to examine issues of bias, coverage and numerical stability.
A detailed account of our findings can be found in our report [@fox_methodologies_2021] and are not repeated in detail here, although some of the issues associated with individual distributions are outlined below.
A detailed account of our findings can be found in our report [@fox_methodologies_2022] and are not repeated in detail here, although some of the issues associated with individual distributions are outlined below.

### Currently recommended default distributions

Expand Down Expand Up @@ -508,7 +508,7 @@ lines(conc, F(conc, 0.01, 5), col = "#1F1CF2")
legend("topleft", "(B)", bty = "n")
```

The Gompertz distribution is available in `ssdtools`, however parameter estimation can be somewhat unstable [@fox_methodologies_2021], and for this reason it is not currently included in the default set.
The Gompertz distribution is available in `ssdtools`, however parameter estimation can be somewhat unstable [@fox_methodologies_2022], and for this reason it is not currently included in the default set.

#### Weibull distribution

Expand Down Expand Up @@ -555,7 +555,7 @@ legend("topleft", "(B)", bty = "n")
The inverse Pareto distribution can be fitted using `ssdtools` by supplying the string `invpareto` to the `dists` argument in the `ssd_fit_dists` call.

While the inverse Pareto distribution is implemented in the `Burrlioz 2.0` software, it is important to understand that it is done so only as one of the limiting Burr distributions (see [technical details](https://bcgov.github.io/ssdtools/articles/E_additional-technical-details.html)).
The inverse Pareto is not offered as a stand-alone option in the `Burrlioz 2.0` software. We have spent considerable time and effort exploring the properties of the inverse Pareto distribution, including deriving bias correction equations and alternative methods for deriving confidence intervals [@fox_methodologies_2021].
The inverse Pareto is not offered as a stand-alone option in the `Burrlioz 2.0` software. We have spent considerable time and effort exploring the properties of the inverse Pareto distribution, including deriving bias correction equations and alternative methods for deriving confidence intervals [@fox_methodologies_2022].
This work has substantial value for improving the current `Burrlioz 2.0` method, and our bias corrections should be adopted when deriving *HCx* estimates from the inverse Pareto where parameters have been estimated using maximum likelihood.

As is the case with the `Burrlioz 2.0` software, we have decided not to include the inverse Pareto distribution in the default candidate set in `ssdtools` although it is offered ass a user-selectable distribution to use in the model-fitting process.
Expand Down
2 changes: 1 addition & 1 deletion vignettes/model-averaging.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Compounding this lack of clarity about the functional form of the SSD is the omn
The ssdtools R package uses a model averaging procedure to avoid the need to a-priori select a candidate distribution and instead uses a measure of ‘fit’ for each model to compute weights to be applied to an initial set of candidate distributions.
The method, as applied in the SSD context is described in detail in [@fox_recent_2021], and potentially provides a level of flexibility and parsimony that is difficult to achieve with a single SSD distribution.

[@fox_methodologies_2021]
[@fox_methodologies_2022]

## Preliminaries

Expand Down
9 changes: 5 additions & 4 deletions vignettes/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -198,13 +198,14 @@ @book{wickham_r_2016
keywords = {Big data, Computer programs, Data mining, Databases, Electronic data processing, Information visualization, R (Computer program language), Statistics}
}

@techreport{fox_methodologies_2021,
@techreport{fox_methodologies_2022,
title = {Joint investigation into {statistical} {methodologies} underpinning the {derivation} of {toxicant} {guideline} {values} in {Australia} and {New Zealand}},
institution = {Environmetrics Australia and Australian Institute of Marine Science},
author = {Fox, David R and Fisher, Rebecca and Thorley, Joseph L and Schwarz, Carl},
author = {Fox, DR and Fisher, R and Thorley, JL and Schwarz, C},
month = mar,
year = {2021},
url={https://environmetrics.net/docs/FOX%20and%20FISHER%20Final_final_report_rev2.3.pdf?189db0&189db0}
year = {2022},
doi = {10.25845/fm9b-7n28},
url = {https://doi.org/10.25845/fm9b-7n28}
}

@techreport{fox_methodologies_2024,
Expand Down

0 comments on commit b7bdd1c

Please sign in to comment.