-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plotting functions #23
Changes from 10 commits
a4e479a
da15415
d137fb2
b3c8bd8
76ef11a
efa4635
088629c
9c40c2d
61ac60e
cd21e6e
ea4ed85
7b2daec
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,16 +3,17 @@ Title: Biomarker Kinetics Modelling | |
Version: 0.0.0.9000 | ||
Authors@R: c(person("Timothy", "Russell", email = "[email protected]", role = c("aut")), | ||
person("Alex", "Hill", email = "[email protected]", role = c("aut", "cre"))) | ||
Description: Fit kinetic curves to biomarker data, using a Bayesian hierarchical model | ||
Description: Fit kinetic curves to biomarker data, using a Bayesian hierarchical model. | ||
License: GPL (>= 3) | ||
Encoding: UTF-8 | ||
Roxygen: list(markdown = TRUE) | ||
RoxygenNote: 7.3.1 | ||
RoxygenNote: 7.3.2 | ||
Imports: | ||
cmdstanr, | ||
data.table, | ||
forcats, | ||
fs, | ||
ggplot2, | ||
instantiate, | ||
logger, | ||
mosaic, | ||
|
@@ -26,11 +27,11 @@ Additional_repositories: | |
SystemRequirements: CmdStan (https://mc-stan.org/users/interfaces/cmdstan) | ||
Suggests: | ||
dplyr, | ||
ggplot2, | ||
knitr, | ||
lubridate, | ||
rmarkdown, | ||
testthat | ||
testthat, | ||
vdiffr | ||
VignetteBuilder: knitr | ||
LinkingTo: | ||
cpp11 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -191,7 +191,7 @@ biokinetics <- R6::R6Class( | |
|
||
return(dt) | ||
}, | ||
extract_parameters = function(params, n_draws = 2500) { | ||
extract_parameters = function(params, n_draws) { | ||
private$check_fitted() | ||
params_proc <- rlang::parse_exprs(params) | ||
|
||
|
@@ -265,6 +265,26 @@ biokinetics <- R6::R6Class( | |
package = "epikinetics" | ||
) | ||
}, | ||
#' @description Plot the kinetics trajectory predicted by the model priors. | ||
#' Note that this is on a log scale, regardless of whether the data was provided | ||
#' on a log or a natural scale. | ||
#' @return A ggplot2 object. | ||
#' @param tmax Integer. The number of time points in each simulated trajectory. Default 150. | ||
#' @param n_draws Integer. The number of trajectories to simulate. Default 2000. | ||
plot_prior_predictive = function(tmax = 150, | ||
n_draws = 2000) { | ||
plot(private$priors, | ||
tmax = tmax, | ||
n_draws = n_draws, | ||
data = private$data) | ||
}, | ||
#' @description Plot model input data with a smoothing function. Note that | ||
#' this plot is on a log scale, regardless of whether data was provided on a | ||
#' log or a natural scale. | ||
#' @return A ggplot2 object. | ||
plot_model_inputs = function() { | ||
plot_sero_data(private$data, private$all_formula_vars) | ||
}, | ||
#' @description View the data that is passed to the stan model, for debugging purposes. | ||
#' @return A list of arguments that will be passed to the stan model. | ||
get_stan_data = function() { | ||
|
@@ -286,9 +306,9 @@ biokinetics <- R6::R6Class( | |
}, | ||
#' @description Extract fitted population parameters | ||
#' @return A data.table | ||
#' @param n_draws Numeric | ||
#' @param n_draws Integer. Default 2000. | ||
#' @param human_readable_covariates Logical. Default TRUE. | ||
extract_population_parameters = function(n_draws = 2500, | ||
extract_population_parameters = function(n_draws = 2000, | ||
human_readable_covariates = TRUE) { | ||
private$check_fitted() | ||
has_covariates <- length(private$all_formula_vars) > 0 | ||
|
@@ -322,10 +342,10 @@ biokinetics <- R6::R6Class( | |
}, | ||
#' @description Extract fitted individual parameters | ||
#' @return A data.table | ||
#' @param n_draws Numeric | ||
#' @param n_draws Integer. Default 2000. | ||
#' @param include_variation_params Logical | ||
#' @param human_readable_covariates Logical. Default TRUE. | ||
extract_individual_parameters = function(n_draws = 2500, | ||
extract_individual_parameters = function(n_draws = 2000, | ||
include_variation_params = TRUE, | ||
human_readable_covariates = TRUE) { | ||
private$check_fitted() | ||
|
@@ -367,11 +387,12 @@ biokinetics <- R6::R6Class( | |
#' @param summarise Boolean. Default TRUE. If TRUE, summarises over draws from posterior parameter distributions to | ||
#' return 0.025, 0.5 and 0.975 quantiles, labelled lo, me and hi, respectively. If FALSE returns values for individual | ||
#' draws from posterior parameter distributions. | ||
#' @param n_draws Integer. Maximum number of samples to include. Default 2500. | ||
#' @param n_draws Integer. Maximum number of samples to include. Default 2000. | ||
simulate_population_trajectories = function( | ||
t_max = 150, | ||
summarise = TRUE, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dont need t_max or summarise here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. they get passed in to should probably refactor this though, as I'm not sure there's much point in having that private method, this is just an artefact of how I copied bits in from Tim's original scripts I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, I wonder how robust these plots are for much longer time frames (likes 1000s of days for something like measles) I can dig up some measles data for testing maybe? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that'd be good! |
||
n_draws = 2500) { | ||
n_draws = 2000) { | ||
|
||
private$check_fitted() | ||
validate_numeric(t_max) | ||
validate_logical(summarise) | ||
|
@@ -388,25 +409,27 @@ biokinetics <- R6::R6Class( | |
dt_out <- dt_out[ | ||
, lapply(.SD, function(x) if (is.factor(x)) forcats::fct_drop(x) else x)] | ||
|
||
if (private$scale == "log") { | ||
return(dt_out) | ||
if (private$scale == "natural") { | ||
if (summarise) { | ||
dt_out <- convert_log2_scale_inverse( | ||
dt_out, vars_to_transform = c("me", "lo", "hi")) | ||
} else { | ||
dt_out <- convert_log2_scale_inverse( | ||
dt_out, vars_to_transform = "mu") | ||
} | ||
} | ||
|
||
if (summarise) { | ||
dt_out <- convert_log2_scale_inverse( | ||
dt_out, vars_to_transform = c("me", "lo", "hi")) | ||
} else { | ||
dt_out <- convert_log2_scale_inverse( | ||
dt_out, vars_to_transform = "mu") | ||
} | ||
class(dt_out) <- append("biokinetics_population_trajectories", class(dt_out)) | ||
attr(dt_out, "summarised") <- summarise | ||
attr(dt_out, "covariates") <- private$all_formula_vars | ||
dt_out | ||
}, | ||
#' @description Process the stan model results into a data.table. | ||
#' @return A data.table of peak and set titre values. Columns are tire_type, mu_p, mu_s, rel_drop_me, mu_p_me, | ||
#' mu_s_me, and a column for each covariate. See the data vignette for details: | ||
#' \code{vignette("data", package = "epikinetics")} | ||
#' @param n_draws Integer. Maximum number of samples to include. Default 2500. | ||
population_stationary_points = function(n_draws = 2500) { | ||
#' @param n_draws Integer. Maximum number of samples to include. Default 2000. | ||
population_stationary_points = function(n_draws = 2000) { | ||
private$check_fitted() | ||
validate_numeric(n_draws) | ||
|
||
|
@@ -458,11 +481,11 @@ biokinetics <- R6::R6Class( | |
#' @param summarise Boolean. If TRUE, average the individual trajectories to get lo, me and | ||
#' hi values for the population, disaggregated by titre type. If FALSE return the indidivudal trajectories. | ||
#' Default TRUE. | ||
#' @param n_draws Integer. Maximum number of samples to draw. Default 2500. | ||
#' @param n_draws Integer. Maximum number of samples to draw. Default 2000. | ||
#' @param time_shift Integer. Number of days to adjust the exposure day by. Default 0. | ||
simulate_individual_trajectories = function( | ||
summarise = TRUE, | ||
n_draws = 2500, | ||
n_draws = 2000, | ||
time_shift = 0) { | ||
private$check_fitted() | ||
validate_logical(summarise) | ||
|
@@ -534,7 +557,10 @@ biokinetics <- R6::R6Class( | |
by = c("calendar_day", "titre_type")) | ||
} | ||
|
||
dt_out[, time_shift := time_shift] | ||
dt_out <- dt_out[, time_shift := time_shift] | ||
class(dt_out) <- append("biokinetics_individual_trajectories", class(dt_out)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't added a plot function for the individual trajectories yet, since this PR is already quite big. But will use the same approach - making the individual trajectories an S3 class and defining a default plot function for objects of this class. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah that seems sensible, good to keep a consistent workflow, lots of other uses for the individual-level trajectories other than plotting too! |
||
attr(dt_out, "summarised") <- summarise | ||
dt_out | ||
} | ||
) | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
#' @title Simulate biomarker kinetics predicted by the given biokinetics priors | ||
#' and optionally compare to a dataset. | ||
#' @export | ||
#' @description Simulate trajectories by drawing random samples from the given | ||
#' priors for each parameter in the biokinetics model. | ||
#' @return A ggplot2 object. | ||
#' @param x A named list of type 'biokinetics_priors'. | ||
#' @param \dots Further arguments passed to the method. | ||
#' @param tmax Integer. The number of time points in each simulated trajectory. Default 150. | ||
#' @param n_draws Integer. The number of trajectories to simulate. Default 2000. | ||
#' @param data Optional data.frame with columns time_since_last_exp and value. The raw data to compare to. | ||
plot.biokinetics_priors <- function(x, | ||
..., | ||
tmax = 150, | ||
n_draws = 2000, | ||
data = NULL) { | ||
|
||
# Declare variables to suppress notes when compiling package | ||
# https://github.com/Rdatatable/data.table/issues/850#issuecomment-259466153 | ||
t0 <- tp <- ts <- m1 <- m2 <- m3 <- NULL | ||
time_since_last_exp <- me <- lo <- hi <- value <- mu <- NULL | ||
|
||
if (!is.null(data)) { | ||
validate_required_cols(data, c("time_since_last_exp", "value")) | ||
} | ||
params <- data.table( | ||
t0 = stats::rnorm(n_draws, x$mu_t0, x$sigma_t0), # titre value at t0 | ||
tp = stats::rnorm(n_draws, x$mu_tp, x$sigma_tp), # time of peak | ||
ts = stats::rnorm(n_draws, x$mu_ts, x$sigma_ts), # time of set point | ||
m1 = stats::rnorm(n_draws, x$mu_m1, x$sigma_m1), # gradient 1 | ||
m2 = stats::rnorm(n_draws, x$mu_m2, x$sigma_m2), # gradient 2 | ||
m3 = stats::rnorm(n_draws, x$mu_m3, x$sigma_m3) # gradient 3 | ||
) | ||
|
||
times <- data.table(t = 1:tmax) | ||
params_and_times <- times[, as.list(params), by = times] | ||
|
||
params_and_times[, mu := biokinetics_simulate_trajectory(t, t0, tp, ts, m1, m2, m3), | ||
by = c("t", "t0", "tp", "ts", "m1", "m2", "m3")] | ||
|
||
summary <- params_and_times[, .(me = stats::quantile(mu, 0.5, names = FALSE), | ||
lo = stats::quantile(mu, 0.025, names = FALSE), | ||
hi = stats::quantile(mu, 0.975, names = FALSE)), by = t] | ||
|
||
plot <- ggplot(summary) + | ||
geom_line(aes(x = t, y = me)) + | ||
geom_ribbon(aes(x = t, ymin = lo, ymax = hi), alpha = 0.5) | ||
|
||
if (!is.null(data)) { | ||
plot <- plot + geom_point(data = data, aes(x = time_since_last_exp, y = value)) | ||
} | ||
plot | ||
} | ||
|
||
plot_sero_data <- function(data, covariates = character(0)) { | ||
validate_required_cols(data, c("time_since_last_exp", "value", "titre_type")) | ||
# Declare variables to suppress notes when compiling package | ||
# https://github.com/Rdatatable/data.table/issues/850#issuecomment-259466153 | ||
time_since_last_exp <- value <- titre_type <- NULL | ||
|
||
ggplot(data) + | ||
geom_point(aes(x = time_since_last_exp, y = value, colour = titre_type)) + | ||
geom_smooth(aes(x = time_since_last_exp, y = value, colour = titre_type)) + | ||
facet_wrap(eval(parse(text = facet_formula(covariates)))) + | ||
guides(colour = guide_legend(title = "Titre type")) | ||
} | ||
|
||
#' Plot method for "biokinetics_population_trajectories" class | ||
#' | ||
#' @param x An object of class "biokinetics_population_trajectories". These are | ||
#' generated by running biokinetics$simulate_population_trajectories(). See | ||
#' \href{../../epikinetics/html/biokinetics.html#method-biokinetics-simulate_population_trajectories}{\code{biokinetics$simulate_population_trajectories()}} | ||
#' @param \dots Further arguments passed to the method. | ||
#' @param data Optional data.table containing raw data as provided to the biokinetics model. | ||
#' @export | ||
plot.biokinetics_population_trajectories <- function(x, ..., data = NULL) { | ||
covariates <- attr(x, "covariates") | ||
|
||
# Declare variables to suppress notes when compiling package | ||
# https://github.com/Rdatatable/data.table/issues/850#issuecomment-259466153 | ||
time_since_last_exp <- value <- me <- titre_type <- lo <- hi <- NULL | ||
day <- last_exp_day <- NULL | ||
|
||
if (attr(x, "summarised")) { | ||
plot <- ggplot(x) + | ||
geom_line(aes(x = time_since_last_exp, y = me, colour = titre_type)) + | ||
geom_ribbon(aes(x = time_since_last_exp, | ||
ymin = lo, | ||
ymax = hi, | ||
fill = titre_type), alpha = 0.5) | ||
} else { | ||
plot <- ggplot(x) + | ||
geom_line(aes(x = time_since_last_exp, y = mu, | ||
colour = titre_type, group = .draw), alpha = 0.1, linewidth = 0.1) | ||
} | ||
|
||
if (!is.null(data)) { | ||
validate_required_cols(data) | ||
plot <- plot + | ||
geom_point(data = data, | ||
aes(x = as.integer(day - last_exp_day, units = "days"), | ||
y = value), size = 0.5, alpha = 0.5) | ||
} | ||
plot + | ||
scale_y_continuous(trans = "log2") + | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought the output of simulate_population_trajectories was already on the log2 scale? why does this need to be transformed here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The trajectories are always returned on the original scale (so log if the input data was on the log scale, but back-transformed to natural scale if the inputs were natural scale). Does this make sense? We could change this behaviour if confusing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah makes sense, I think default behaviour just needs to be explicit (which is assuming that data is on a natural scale right)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah good catch this should be conditional on the input data being on a natural scale! |
||
facet_wrap(eval(parse(text = facet_formula(covariates)))) + | ||
guides(fill = guide_legend(title = "Titre type"), colour = "none") | ||
} | ||
|
||
facet_formula <- function(covariates) { | ||
paste("~", paste(c("titre_type", covariates), collapse = "+")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would keep default n_draws consistent across all functions, default samples in stan is 2000 so would stick to that