updates

drizopoulos · Apr 23, 2024 · 6994a43 · 6994a43
1 parent 12cb680
commit 6994a43
Show file tree

Hide file tree

Showing 9 changed files with 70 additions and 90 deletions.
diff --git a/vignettes/Competing_Risks.Rmd b/vignettes/Competing_Risks.Rmd
@@ -21,8 +21,8 @@ library("JMbayes2")
 
 # Competing Risks
 ## Prepare data
-The first step to fit a joint model for competing events in **JMbayes2** is to prepare the data for the event process. If there are $K$ competing events, then each subject needs to have $K$ rows, one for each possible cause. The observed event time $T_i$ of each subject is repeated $K$ times, and there are two indicator variables, namely one identifying the cause, and one indicating whether the corresponding event type is the one that occurred. Standard survival datasets that included a single row
-per patient, can be easily transformed to the competing risks long format using function `crisk_setup()`. This function accepts as main arguments the survival data in the standard format that has a single row per patient, the name of the status variable, and the level in this status variable that corresponds to censoring. We illustrate the use of this function in the PBC data, in which we treat as competing risks transplantation and death:
+The first step in fitting a joint model for competing events in **JMbayes2** is to prepare the data for the event process. If there are $K$ competing events, each subject must have $K$ rows, one for each possible cause. The observed event time $T_i$ of each subject is repeated $K$ times, and there are two indicator variables, namely one identifying the cause and one indicating whether the corresponding event type is the one that occurred. Standard survival datasets that include a single row
+per patient can be easily transformed to the competing risks long format using the function `crisk_setup()`. This function accepts as main arguments the survival data in the standard format with a single row per patient, the name of the status variable, and the level in this status variable that corresponds to censoring. We illustrate the use of this function in the PBC data, in which we treat as competing risks transplantation and death:
 ```{r, "prepare_data"}
 pbc2.id[pbc2.id$id %in% c(1, 2, 5), c("id", "years", "status")]
 
@@ -43,7 +43,7 @@ CoxFit_CR <- coxph(Surv(years, status2) ~ (age + drug) * strata(CR),
                      data = pbc2.idCR)
 ```
 
-For the longitudinal process, we include two longitudinal outcomes, namely serum bilirubin and the prothrombin time. For the former we use quadratic orthogonal polynomials in the fixed- and random-effects parts, and for the latter linear evolutions:
+We include two longitudinal outcomes for the longitudinal process, namely serum bilirubin and the prothrombin time. For the former, we use quadratic orthogonal polynomials in the fixed- and random-effects parts, and for the latter, linear evolutions:
 ```{r, "mixed models"}
 fm1 <- lme(log(serBilir) ~ poly(year, 2) * drug, data = pbc2, 
            random = ~ poly(year, 2) | id)
@@ -69,14 +69,14 @@ summary(jFit_CR)
 ```
 
 ## Dynamic predictions
-Based on the fitted competing risks joint model, we will illustrate how (dynamic) predictions for the cause-specific cumulative risk probabilities can be calculated. As an example, we will show these calculations for Patient 81 from the PBC dataset. First, we extract the data of this subject. 
+Based on the fitted competing risks joint model, we will illustrate how (dynamic) predictions can be calculated for the cause-specific cumulative risk probabilities. We will show these calculations for Patient 81 from the PBC dataset as an example. First, we extract the data on this subject. 
 ```{r, "data_P81"}
 ND_long <- pbc2[pbc2$id == 81, ]
 ND_event <- pbc2.idCR[pbc2.idCR$id == 81, ]
 ND_event$status2 <- 0
 ND <- list(newdataL = ND_long, newdataE = ND_event)
 ```
-The first line extracts the longitudinal measurements, and the second line extracts the event times per cause (i.e., death and transplantation). This particular patient died at 6.95 years, but to make the calculation of cause-specific cumulative risk more relevant, we presume that she did not have the event, and we set the event status variable `status2` to zero. The last line combines the two datasets in a list. *Note:* this last step is a prerequisite from the `predict()` method for competing risks joint model. That is, the datasets provided in the arguments `newdata` and `newdata2` need to be named lists with two components. The first component needs to be named `newdataL` and contain the dataset with the longitudinal measurements, and the second component needs to be named `newdataE` and contain the dataset with the event information.
+The first line extracts the longitudinal measurements, and the second line extracts the event times per cause (i.e., death and transplantation). This patient died at 6.95 years, but to make the calculation of cause-specific cumulative risk more relevant, we presume that she did not have the event, and we set the event status variable `status2` to zero. The last line combines the two datasets in a list. *Note:* this last step is a prerequisite from the `predict()` method for competing risks joint model. That is, the datasets provided in the arguments `newdata` and `newdata2` need to be named lists with two components. The first component needs to be named `newdataL` and contain the dataset with the longitudinal measurements. The second component needs to be named `newdataE` and contain the dataset with the event information.
 
 The predictions are calculated using the `predict()` method. The first call to this function calculates the prediction for the longitudinal outcomes at the times provided in the `times` argument, and the second call calculates the cause-specific cumulative risk probabilities. By setting the argument `return_newdata` to `TRUE` in both calls, we can use the corresponding `plot()` method to depict the predictions:
 ```{r, "CIFs"}

diff --git a/vignettes/Dynamic_Predictions.Rmd b/vignettes/Dynamic_Predictions.Rmd
@@ -27,10 +27,8 @@ $$\begin{array}{l}
 \pi_j(u \mid t) = \mbox{Pr}\{T_j^* \leq u \mid T_j^* > t, \mathcal Y_j(t), \mathcal D_n\}\\\\
 = \displaystyle 1 - \int\int \frac{S(u \mid b_j, \theta)}{S(t \mid b_j, \theta)} \; p\{b_j \mid T_j^* > t, \mathcal Y_j(t), \theta\} \; p(\theta \mid \mathcal D_n) \; db_j d\theta,
 \end{array}$$
-where $S(\cdot)$ denotes the survival function conditional on the random effects, and 
-$\mathcal Y_j(t) = \{\mathcal Y_{1j}(t), \ldots, \mathcal Y_{Kj}(t)\}$. Combining the 
-three terms in the integrand we can device a Monte Carlo scheme to obtain estimates of 
-these probabilities, namely,
+where $S(\cdot)$ denotes the survival function conditional on the random effects and $\mathcal Y_j(t) = \{\mathcal Y_{1j}(t), \ldots, \mathcal Y_{Kj}(t)\}$. Combining the 
+three terms in the integrand, we can devise a Monte Carlo scheme to obtain estimates of these probabilities, namely,
 
 1. Sample a value $\tilde \theta$ from the posterior of the parameters 
 $[\theta \mid \mathcal D_n]$.
@@ -50,9 +48,7 @@ and their standard error by calculating the standard deviation across the Monte
 samples.
 
 ## Example
-We will illustrate the calculation of dynamic predictions using package **JMbayes2** from a
-trivariate joint model fitted to the PBC dataset for the longitudinal outcomes `serBilir` (continuous),
-`prothrombin` time (continuous) and `ascites` (dichotomous). We start by fitting the univariate mixed models. For the two continuous outcomes, we allow for nonlinear subject-specific time effects using natural cubic splines. For `ascites`, we postulate linear subject-specific profiles for the log odds. The code is:
+We will illustrate the calculation of dynamic predictions using package **JMbayes2** from a trivariate joint model fitted to the PBC dataset for the longitudinal outcomes `serBilir` (continuous), `prothrombin` time (continuous), and `ascites` (dichotomous). We start by fitting the univariate mixed models. For the two continuous outcomes, we allow for nonlinear subject-specific time effects using natural cubic splines. For `ascites`, we postulate linear subject-specific profiles for the log odds. The code is:
 ```{r}
 fm1 <- lme(log(serBilir) ~ ns(year, 3) * sex, data = pbc2,
            random = ~ ns(year, 3) | id, control = lmeControl(opt = 'optim'))
@@ -84,19 +80,19 @@ ND$status2 <- 0
 ND$years <- t0
 ```
 
-We will only use the first five years of follow-up (line three), and further we specify that the patients were event-free up to this time point (lines four and five).
+We will only use the first five years of follow-up (line three) and specify that the patients were event-free up to this point (lines four and five).
 
-We start with predictions for the longitudinal outcomes. These are produced by the `predict()` method for class `jm` objects, and follow the same lines as the procedure described above for cumulative risk probabilities. The only difference is in Step 3, where instead of calculating the cumulative risk we calculate the predicted values for the longitudinal outcomes. There are two options controlled by the `type_pred` argument, namely predictions at the scale of the response/outcome (default) or at the linear predictor level. The `type` argument controls if the predictions will be for the mean subject (i.e., including only the fixed effects) or subject-specific including both the fixed and random effects. In the `newdata` argument we provide the available measurements of the two patients. This will be used to sample their random effects at Step 2 presented above. This is done with a Metropolis-Hastings algorithm that runs for `n_mcmc` iterations; all iterations but the last one are discarded as burn-in. Finally, argument `n_samples` corresponds to the value of $L$ defined above and specifies the number of Monte Carlo samples: 
+We start with predictions for the longitudinal outcomes. These are produced by the `predict()` method for class `jm` objects and follow the same lines as the procedure described above for cumulative risk probabilities. The only difference is in Step 3, where instead of calculating the cumulative risk, we calculate the predicted values for the longitudinal outcomes. There are two options controlled by the `type_pred` argument, namely predictions at the scale of the response/outcome (default) or at the linear predictor level. The `type` argument controls whether the predictions will be for the mean subject (i.e., including only the fixed effects) or subject-specific, including both the fixed and random effects. In the `newdata` argument we provide the available measurements of the two patients. This will be used to sample their random effects in Step 2, presented above. This is done with a Metropolis-Hastings algorithm that runs for `n_mcmc` iterations; all iterations but the last one are discarded as burn-in. Finally, argument `n_samples` corresponds to the value of $L$ defined above and specifies the number of Monte Carlo samples: 
 ```{r}
 predLong1 <- predict(jointFit, newdata = ND, return_newdata = TRUE)
 ```
 
-Argument `return_newdata` specifies that the predictions are returned as extra columns of the `newdata` data.frame. By default the 95\% credible intervals are also included. Using the `plot()` method for objects returned by `predict.jm(..., return_newdata = TRUE)`, we can display the predictions. With the following code we do that for the first longitudinal outcome:
+Argument `return_newdata` specifies that the predictions are returned as extra columns of the `newdata` data.frame. By default, the 95\% credible intervals are also included. Using the `plot()` method for objects returned by `predict.jm(..., return_newdata = TRUE)`, we can display the predictions. With the following code, we do that for the first longitudinal outcome:
 ```{r, fig.align = "center", fig.width = 8.5, fig.height = 7.5}
 plot(predLong1)
 ```
 
-When we want to calculate predictions for other, future time points, we can accordingly specify the `times` argument. In the following example, we calculate predictions from time `t0` to time 12:
+When we want to calculate predictions for other future time points, we can accordingly specify the `times` argument. In the following example, we calculate predictions from time `t0` to time 12:
 ```{r}
 predLong2 <- predict(jointFit, newdata = ND,
                      times = seq(t0, 12, length.out = 51),
@@ -120,7 +116,7 @@ The predictions are included again as extra columns in the corresponding data.fr
 plot(predLong2, predSurv)
 ```
 
-Again by default, the plot is for the predictions of the first subject (i.e., Patient 25) and for the first longitudinal outcome (i.e., `log(serBilir)`). However, the `plot()` method has a series of arguments that allows users to customize the plot. We illustrate some of these capabilities with the following figure. First, we specify that we want to depict all three outcomes using `outcomes = 1:3` (note: a max of three outcomes can be simultaneously displayed). Next, we specify via the `subject` argument that we want to show the predictions of Patient 93. Note, that for serum bilirubin we used the log transformation in the specification of the linear mixed model. Hence, we receive predictions on the transformed scale. To show predictions on the original scale, we use the `fun_long` argument. Because we have three outcomes, this needs to be a list of three functions. The first one, corresponding to serum bilirubin is the `exp()` and for the other two the `identity()` because we do not wish to transform the predictions. Analogously, we also have the `fun_event` argument to transform the predictions for the event outcome, and in the example below we set that we want to obtain survival probabilities. Using the arguments `bg`, `col_points`, `col_line_long`, `col_line_event`, `fill_CI_long`, and `fill_CI_event` we have changed the appearance of the plot to a dark theme. Finally, the `pos_ylab_long` specifies the relative positive of the y-axis labels for the three longitudinal outcomes.
+Again, by default, the plot is for the predictions of the first subject (i.e., Patient 25) and the first longitudinal outcome (i.e., `log(serBilir)`). However, the `plot()` method has a series of arguments that allow users to customize the plot. We illustrate some of these capabilities with the following figure. First, we specify that we want to depict all three outcomes using `outcomes = 1:3` (note: a max of three outcomes can be simultaneously displayed). Next, we specify via the `subject` argument that we want to show the predictions of Patient 93. Note that for serum bilirubin, we used the log transformation in the specification of the linear mixed model. Hence, we receive predictions on the transformed scale. To show predictions on the original scale, we use the `fun_long` argument. Because we have three outcomes, this needs to be a list of three functions. The first one, corresponding to serum bilirubin, is the `exp()`, and for the other two the `identity()` because we do not wish to transform the predictions. Analogously, we also have the `fun_event` argument to transform the predictions for the event outcome, and in the example below, we set the goal of obtaining survival probabilities. Using the arguments `bg`, `col_points`, `col_line_long`, `col_line_event`, `fill_CI_long`, and `fill_CI_event`, we have changed the appearance of the plot to a dark theme. Finally, the `pos_ylab_long` specifies the relative positive of the y-axis labels for the three longitudinal outcomes.
 ```{r, eval = FALSE}
 cols <- c('#F25C78', '#D973B5', '#F28322')
 plot(predLong2, predSurv, outcomes = 1:3, subject = 93,
@@ -169,7 +165,7 @@ tvAUC(roc)
 
 This function either accepts an object of class `tvROC` or of class `jm`. In the latter case, the user must also provide the `newdata`, `Tstart` and `Dt` or `Thoriz` arguments. Here we have used the same dataset as the one to fit the model, but, in principle, discrimination could be (better) assessed in another dataset.
 
-To assess the accuracy of the predictions we produce a calibration plot:
+To assess the accuracy of the predictions, we produce a calibration plot:
 ```{r, fig.align = "center", fig.width = 8.5, fig.height = 7.5}
 calibration_plot(jointFit, newdata = pbc2, Tstart = t0, Dt = 3)
 ```
@@ -197,8 +193,5 @@ tvBrier(jointFit, newdata = pbc2, Tstart = t0, Dt = 3, integrated = TRUE,
 
 **Notes:**
 
-- To obtain valid estimates of the predictive accuracy measures (i.e., time-varying sensitivity, specificity, and Brier score) we need to account for censoring. A popular method to achieve this is via inverse probability of censoring weighting. For this approach to be valid, we need the model for the weights to be correctly specified. In standard survival analysis, this is achieved either using the Kaplan-Meier estimator or a Cox model for the censoring distribution. However, in the settings where joint models are used, it is often the case that the censoring mechanism may depend on the history of the longitudinal outcomes in a complex manner. This is especially the case when we consider multiple longitudinal outcomes in the analysis. Also, these outcomes may be recorded at different time points per patient and have missing data. Because of these reasons, in these settings, Kaplan-Meier-based or Cox-based censoring weights may be difficult to derive or be biased. The functions in **JMbayes2** that calculate the predictive accuracy measures use joint-model-based weights to account for censoring. These weights allow censoring to depend in any possible manner on the history of the longitudinal outcomes. However, they require that the model is appropriately calibrated.
+- To obtain valid estimates of the predictive accuracy measures (i.e., time-varying sensitivity, specificity, and Brier score) we need to account for censoring. A popular method to achieve this is via the inverse probability of censoring weighting. For this approach to be valid, we need the model for the weights to be correctly specified. In standard survival analysis, this is achieved either using the Kaplan-Meier estimator or a Cox model for the censoring distribution. However, in the settings where joint models are used, it is often the case that the censoring mechanism may depend on the history of the longitudinal outcomes in a complex manner. This is especially the case when we consider multiple longitudinal outcomes in the analysis. Also, these outcomes may be recorded at different time points per patient and have missing data. Because of these reasons, in these settings, Kaplan-Meier-based or Cox-based censoring weights may be difficult to derive or be biased. The functions in **JMbayes2** that calculate the predictive accuracy measures use joint-model-based weights to account for censoring. These weights allow censoring to depend in any possible manner on the history of the longitudinal outcomes. However, they require that the model is appropriately calibrated.
 - The calibration curve, produced by `calibration_plot()`, and the calibration metrics, produced by `calibration_metrics())`, are calculated using the procedure described in [Austin et al., 2020](https://doi.org/10.1002/sim.8570).
-
-
-