This repository contains all the code necessary to recreate the simulation study published in A flexible multi-metric Bayesian framework for decision-making in Phase II multi-arm multi-stage studies by Suzanne Dufault, Katie Rolfe, Angela Crook, and Patrick Phillips appearing in Statistics in Medicine (2023). The code has not been optimized, but should accurately return the results described in the paper.
This folder contains all of the code necessary to simulate the TTP and survival data and perform the Bayesian estimation on the simulated TTP data.
Note: I had to run the following in order to use the cmdstanr
backend:
# we recommend running this is a fresh R session or restarting your current session
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
-
00_run-file.R
- This file sets up the global conditions for the simulation study (e.g., number of simulations, number of arms, number of participants, etc.) and calls in the subsequent three scripts:01_simulate-ttp-data.R
- Simulates the time-to-positivity data based on estimated parameters from a Bayesian hierarchical linear model applied to the REMoxTB TTP data.02_add-fixed-effects.R
- Adds the fixed effects including the simulated enrollment day such that 10 participants are enrolled per week03_simulate-survival-outcomes.R
- Simulates the survival outcomes for each simulated TTP dataset. We have assumed there is no direct link between TTP and survival, hence the simulated survival data only relies on regimen. Because there are 12 survival settings we are interested in, we append each of those simulated endpoints to each simulated TTP dataset.
-
1*_run-analysis-ttp.R
- The Bayesian analysis is computationally expensive and therefore management of the analysis is split across 10 files such that 100 simulated datasets for each TTP setting and sample size are analyzed at one time. Therefore, file10_run-analysis-ttp.R
corresponds to the first 100 simulated datasets and file19_run-analysis-ttp.R
corresponds to the 900th-1000th simulated datasets. -
20_perform-ttp-analysis.R
- The previous files "manage" the distribution of simulated datasets such that the computational cost can be distributed. This file contains the actual script for applying thebrms
model to the simulated dataset.
00_run-performance-analyses.R
This script sets up the analysis of the survival and TTP data for the 30, 40, 60, and 80 participant per arm datasets. In particular, this script is important to set up the target product profile thresholds, which will be used in the following scripts.01_combining-mcmc-chains
Because the Bayesian analysis of the simulated data was split into 10 segments, this script simply takes those separate MCMC results and combines them into a more convenient object.- input:
data/bayes-generated/[DATE]_simulated-lmm_random-slope_lod-25_nk-*_mcmc-*.RData
, or for the null TTP setting,data/bayes-generated/[DATE]_bayes-mcmc-results_no-winners_*.RData
- output:
data/cleaned/[DATE]_mcmc_random-intercept-random-slope_nk-*.RData
- input:
02_combining-modresults
Because the Bayesian analysis of the simulated data was split into 10 segments, this script simply takes those separate model results and combines them into a more convenient object.- input:
data/bayes-generated/[DATE]_simulated-lmm_random-slope_lod-25_nk-*_modresults-*.RData
- output:
data/cleaned/[DATE]_summary_random-intercept-random-slope_nk-*.RData
- input:
03_tpp-decisions.R
This applies the target product profile framework to the posterior results (captured in the MCMC objects) and returns decisions for each of the arms in each simulated dataset.- input:
data/cleaned/[DATE]_mcmc_random-intercept-random-slope_nk-*.RData
- output:
data/analyzed/target-product-profile/[DATE]_tpp-decisions.RData
- input:
04_ranking-probabilities.R
This estimates the posterior probability of ranking based on the TTP MCMC results.- input:
data/cleaned/[DATE]_mcmc_random-intercept-random-slope_nk-*.RData
- output:
data/analyzed/ranking/[DATE]_ranking-probabilities.RData
- input:
05_compare-v-control.R
This estimates the posterior probability that each novel arm is better than the control.- input:
data/cleaned/[DATE]_mcmc_random-intercept-random-slope_nk-*.RData
- output:
data/analyzed/ranking/[DATE]_compare-v-control.RData
- input:
06_relapse-counts.R
This script counts the number of relapses that have occurred by the first interim analysis.- input:
- output:
analyzed/[DATE]_relapse-counts-by-simulation_[XX].RData
simulation-results_full.Rmd
contains the simulation results for the random intercept, random slope simulated datasets. Also includes code for generating figures for the manuscript.
filename | description | input | output |
---|---|---|---|
coef-function.R | This estimates the coefficient values for the parametric survival model based on the inputs |
|
|
df_extract-mcmc-slopes-function.R | This function extracts the MCMC parameter estimates from the brms model output and converts them all into stand-alone slopes. | modresults a brms model object |
a (n_chains x n_iterations ) x 5 regimens dataframe with credible slope estimates |
df-sim-function.R | This function simulates a dataset based on a random intercept model. | nk number per regimen k number of regimens betas main effects weeks number of weeks under observation sd_id standard deviation of the random intercepts sd_randomnoise measurement error lod limit of detection |
df_temp a simulated data frame |
df-sim-wrapper-function.R | This function wraps around the previous and generates as many simulated datasets as requested. | n_sims number of simulated datasets required nk number per regimen k number of regimens betas main effects weeks number of weeks under observation sd_id standard deviation of the random intercepts sd_randomnoise measurement error lod limit of detection |
df_list a list of simulated dataframes |
df-sim-function_random-slope.R | This function simulates a dataset based on a random intercept, random slope model. | nk number per regimen k number of regimens betas main effects weeks number of weeks under observation sd_id standard deviation of the random intercepts sd_slope standard deviation of the random slopes rho correlation of random effects sd_randomnoise measurement error lod limit of detection |
df_temp a simulated data frame |
df-sim-wrapper-function_random-slope.R | This function wraps around the previous and generates as many simulated datasets as requested. | n_sims number of simulated datasets required nk number per regimen k number of regimens betas main effects weeks number of weeks under observation sd_id standard deviation of the random intercepts sd_slope standard deviation of the random slopes rho correlation of random effects sd_randomnoise measurement error lod limit of detection |
df_list a list of simulated dataframes |
log-hazard-function.R | This takes the data x (dataframe), coefficients betas (named vector), and time t (numeric) and returns the log-hazard |
|
log-hazard. Used by the simsurv function to simulate survival data. |
mcmc-compare-v-control.R | This is used in analysis/05_compare-v-control.R to estimate posterior probability that a given arm has steeper TTP slope than control |
mcmc_df data frame of MCMC credible estimates of TTP slopes |
output dataframe |
mcmc-rank-function.R | This is used in analysis/04_ranking-probabilities.R to estimate the posterior probability associated with ranking TTP |
mcmc_df data frame of MCMC credible estimates of TTP slopes |
output dataframe |
target-product-profile-function.R | This is used in analysis/03_tpp-decisions.R to return TPP decisions |
mcmc_list list of dataframes containing MCMC credible estimates for TTP slopenk number per regimen theta_lrv minimum acceptable value on TTP slope (%)theta_tv target value for TTP slope (%)tau_lrv maximum allowable risk that an arm is advanced that does not reach the minimal level of acceptable efficacytau_tv maximum allowable risk that an arm is issued a NO-GO decision when it has an unequivocal improvement in efficacy |
out dataframe |
weibull-survival-function.R | The Weibull survival function (used for root-finding) | lambda p surv.prop proportion survived t time t |
simulation-results_full.Rmd
is the larger comprehensive file containing all analysis results from all simulated datasets
The figures included in the manuscript are under the folder submission
. Otherwise, includes a random assortment of figures generated by (analyses of) the simulated data.
analyzed
2022-03-07_sim-level-ttp-results_nk*_4mo-duration.RData
- contains log10(TTP) metric estimates for each arm for each setting
bayes-generated
simulated-datasets
2022-03-selecting-from-three.*
the Rmd and html walking through how we expect things to change given we were using 3 novel regimens rather than four. Can be moved to main folders, but will need to be de-identified first- Brainstorming of seamless study with collaborators (.docx)
- PPTX walking through the more complex survival scenarios when multiple durations were being considered
- draft tables for presenting results (.docx)