Skip to content

Commit

Permalink
prepare wrapper written and documented
Browse files Browse the repository at this point in the history
  • Loading branch information
lgessl committed Nov 15, 2023
1 parent 6ec9050 commit 77285aa
Show file tree
Hide file tree
Showing 2 changed files with 129 additions and 0 deletions.
65 changes: 65 additions & 0 deletions R/prepare.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#' @title Prepare data for model fitting and predicting
#' @description Read data from files, generate the predictor and
#' respondent matirx and control quality.
#' @param directory string. The directory where both expression and
#' pheno data files lie.
#' @param model string. For which model to prepare. One of
#' * `"cox_lasso_zerosum"` (Cox proportional hazards with LASSO reularization
#' and zero-sum constraint)
#' * `"lasso_zerosum"` (ordinary linear regression with LASSO regularization)
#' @param include_from_continuous_pheno vector of strings. The names of the
#' *continuous* variables in the pheno data file to be included in the
#' predictor matrix. The values will be coerced to numeric. Default is `NULL`,
#' which means no continuous pheno variables will be included.
#' @param include_from_discrete_pheno vector of strings. The names of the
#' *discrete* variables in the pheno data file to be included in the predictor
#' matrix. A discrete variable with n levels will be converted to n-1 binary
#' variables. Default is `NULL`, which means no discrete pheno variables will
#' be included.
#' @param expr_fname string. The name of the expression .csv file inside `directory`.
#' Default is `"expr.csv"`.
#' @param pheno_fname string. The name of the pheno data .csv inside `directory`.
#' Default is `"pheno.csv"`.
#' @details The pheno .csv files holds the samples as rows (with the unique sample names
#' in the very first column), the variables as columns. We need at least the columns
#' * progression (integer, 0 for censored, 1 for uncensored),
#' * pfs_yrs (integer, the time in years to progression or censoring),
#' * `ìnclude_from_continuous_pheno`,
#' * `include_from_discrete_pheno`.
#' The expr .csv file holds the genes as rows (with the unique gene names in the very
#' first column), the samples as columns. The sample names in pheno and expression files
#' must be identical and in the same order.
#' @return A list with two elements, `x` and `y`. `x` is the predictor matrix, `y` is
#' the respondent matrix with the samples as rows.
prepare <- function(
directory,
model,
include_from_continuous_pheno = NULL,
include_from_discrete_pheno = NULL,
expr_fname = "expr.csv",
pheno_fname = "pheno.csv"
){
tbls <- read(
directory = directory,
model = model,
include_from_continuous_pheno = include_from_continuous_pheno,
include_from_discrete_pheno = include_from_discrete_pheno
)
x <- generate_predictor(
expr_df = tbls[["expr"]],
pheno_df = tbls[["pheno"]],
include_from_continuous_pheno = include_from_continuous_pheno,
include_from_discrete_pheno = include_from_discrete_pheno
)
prepare_model <- switch(model,
"cox_lasso_zerosum" = prepare_cox_lasso_zerosum,
"lasso_zerosum" = prepare_cox_lasso)
y <- generate_respondent(
tbls[["pheno"]]
)
prepare_qc(
x = x,
y = y
)
return(list("x" = x, "y" = y))
}
64 changes: 64 additions & 0 deletions man/prepare.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 77285aa

Please sign in to comment.