Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of format() in removeFormulaTerms() produces warning when long formulas are used #153

Open
amanda-hi opened this issue Dec 12, 2024 · 1 comment

Comments

@amanda-hi
Copy link

amanda-hi commented Dec 12, 2024

I'm receiving warnings when providing a formula >60 characters (give or take) to predict.psm(). These warnings seem to be stemming from removeFormulaTerms(), specifically the use of format() that was added in the rms 6.8.1 release. When long formulas are provided to predict(), format() breaks them apart into vectors of length >1. This throws the following warning:

Using formula(x) is deprecated when x is a character vector of length > 1.
  Consider formula(paste(x, collapse = " ")) instead.

See below for a reprex.

reprex::reprex({
  # Setup --------------------------------------------------
  suppressPackageStartupMessages({
    library(rms) # Must be version 6.8.1 or greater
    library(withr)
  })

  n <- 1000
  withr::with_seed(17, {
    df <- data.frame(age  = rnorm(n, 50, 10),
                     blood.pressure = rnorm(n, 120, 15),
                     cholesterol    = rnorm(n, 200, 25),
                     sex            = factor(sample(c('female','male'), n, TRUE)),
                     treat          = factor(sample(c('a','b','c'), n, TRUE))
    )
    df$L <- .4*(df$sex=='male') + .045*(df$age-50) +
      (log(df$cholesterol - 10)-5.2)*(-2*(df$sex=='female') + 2*(df$sex=='male')) +
      .3*sqrt(df$blood.pressure-60)-2.3 + 1*(df$treat=='b')
    df$y <- ifelse(runif(n) < plogis(df$L), 1, 0)
    df$h <- .02*exp(.06*(df$age-50)+.8*(df$sex=='Female'))
    df$d.time <- -log(runif(n))/df$h
    df$cens <- 15*runif(n)
    df$death <- ifelse(df$d.time <= df$cens,1,0)
    df$d.time <- pmin(df$d.time, df$cens)
    
    feats <- replicate(5, stats::rnorm(n, mean = 2500, sd = 500))
    colnames(feats) <- paste0("feature", 1:5)
    df <- cbind(df, feats)
  })
  
  # Testing --------------------------------------------
  form_short   <- Surv(d.time, death) ~ feature1 + age
  form_long    <- Surv(d.time, death) ~ feature1 + feature2 + feature3 + feature4 + age
  model_short  <- rms::psm(form_short, df)
  model_long   <- rms::psm(form_long, df)
  
  invisible(predict(model_short, type = "lp"))
  invisible(predict(model_long, type = "lp")) # Throws warning
  
  invisible(
    rms:::removeFormulaTerms(model_short$sformula,
                                  which = "offset", 
                                  delete.response = TRUE))
  
  invisible(
    rms:::removeFormulaTerms(model_long$sformula, 
                                  which = "offset", 
                                  delete.response = TRUE)) # Throws warning
  
  # Dissecting relevant section of the updated removeFormulaTerms()
  # https://github.com/harrelfe/rms/blob/ef8982e237fbbf561fd5b7fe675e786dd1118549/R/rmsMisc.s#L1396
  which <- "offset"
  if('offset' %in% which) {
    test_form_long <- format(model_long$sformula)
    print(length(test_form_long))
    which[which == 'offset'] <- '.off.'
    mod_form_long <- gsub('offset(', '.off.(', test_form_long, fixed=TRUE)
    form <- as.formula(mod_form_long)
  }

  format(model_short$sformula)
  format(model_long$sformula)
})
@amanda-hi
Copy link
Author

I think this is the same problem that's being reported in #148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant