Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: option to pass vcov to add_global_p(anova_fun = tidy_wald_test) for robust p-values #2076

Open
aghaynes opened this issue Nov 21, 2024 · 4 comments

Comments

@aghaynes
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When adjusting variances via e.g. clustered variances, it's possible to adjust the confidence interval, but not the p-value.

tbl_regression(m, exponentiate = TRUE, 
               tidy_fun = \(x, ...) 
               tidy_robust(
                 x,
                 exponentiate = TRUE,
                 conf.level = 0.95,
                 conf.int = TRUE,
                 vcov = cov
               )
               ) %>% 
  # get the results of the wald test
  add_global_p(anova_fun = tidy_wald_test)

Describe the solution you'd like
It would be great to be able to pass a vcov to (e.g. cov in the above code) to tidy_wald_test so that aod::wald.test can also use it.

e.g.

... |>
add_global_p(anova_fun = tidy_wald_test, vcov = cov)

Describe alternatives you've considered
@mroumet and I managed to backwards engineer a solution that takes a cov object from the global environment. Maybe it's possible to pass it as an option instead though. I indicate the modification we made with an arrow.

tidy_wald_test2 <- 
  function (x, tidy_fun = NULL,...) {
    # assert_package("aod", "tidy_wald_test()")
    tidy_fun <- tidy_fun %||% broom.helpers::tidy_with_broom_or_parameters
    broom.helpers::tidy_and_attach(model = x, tidy_fun = tidy_fun) %>% 
      broom.helpers::tidy_identify_variables() %>% dplyr::select(term = "variable", 
                                                                 model_terms = "term") %>% dplyr::mutate(term_id = dplyr::row_number()) %>% 
      tidyr::nest(data = -"term") %>% dplyr::rowwise() %>% 
      dplyr::mutate(model_terms = unlist(.data$data[["model_terms"]]) %>% 
                      list(), model_terms_id = rlang::set_names(.data$data[["term_id"]]) %>% 
                      list(), wald_test = aod::wald.test(Sigma = cov,       ########## <----------------------------
                                                         b = stats::coef(x), Terms = .data$model_terms_id) %>% 
                      list(), df = .data$wald_test$result$chi2 %>% purrr::pluck("df"), 
                    statistic = .data$wald_test$result$chi2 %>% purrr::pluck("chi2"), 
                    p.value = .data$wald_test$result$chi2 %>% purrr::pluck("P"), 
      ) %>% dplyr::ungroup() %>% dplyr::select("term", 
                                               "df", "statistic", "p.value")
  }

... |>
add_global_p(anova_fun = tidy_wald_test2)

I've just found that the proposed edits (passing it via an option) are very easy and seem to work... I'll make a PR shortly.

Additional context

MRE using a example data from the sandwich package

library(sandwich)
library(gtsummary)
library(rlang)

data("InstInnovation", package = "sandwich")

## replication of one-way clustered standard errors for model 3, Table I
## and model 1, Table II in Berger et al. (2017), see ?InstInnovation

## count regression formula
f1 <- cites ~ institutions + log(capital/employment) + log(sales) + year

## model 3, Table I: Poisson model
## one-way clustered standard errors
m <- glm(f1, data = InstInnovation, family = poisson)

# define var-covar
cov <- vcovCL(m, cluster = ~ company, vcov_type = "HC0")

# Define a function to perform Wald test using robust variance covariance matrix----
# The function is actually a slight modification of tidy_wald_test . In this version we define Sigma = cov (where cov is the robust variance-covariance matrix) insted of Sigma = stats::vcov(x).

tidy_wald_test2 <- 
  function (x, tidy_fun = NULL,...) {
    # assert_package("aod", "tidy_wald_test()")
    tidy_fun <- tidy_fun %||% broom.helpers::tidy_with_broom_or_parameters
    broom.helpers::tidy_and_attach(model = x, tidy_fun = tidy_fun) %>% 
      broom.helpers::tidy_identify_variables() %>% dplyr::select(term = "variable", 
                                                                 model_terms = "term") %>% dplyr::mutate(term_id = dplyr::row_number()) %>% 
      tidyr::nest(data = -"term") %>% dplyr::rowwise() %>% 
      dplyr::mutate(model_terms = unlist(.data$data[["model_terms"]]) %>% 
                      list(), model_terms_id = rlang::set_names(.data$data[["term_id"]]) %>% 
                      list(), wald_test = aod::wald.test(Sigma = cov,      ########## <----------------------------
                                                         b = stats::coef(x), Terms = .data$model_terms_id) %>% 
                      list(), df = .data$wald_test$result$chi2 %>% purrr::pluck("df"), 
                    statistic = .data$wald_test$result$chi2 %>% purrr::pluck("chi2"), 
                    p.value = .data$wald_test$result$chi2 %>% purrr::pluck("P"), 
      ) %>% dplyr::ungroup() %>% dplyr::select("term", 
                                               "df", "statistic", "p.value")
  }

# Get the Robust variance ----
tab <- tbl_regression(m, exponentiate = TRUE, 
               tidy_fun = \(x, ...) 
               tidy_robust(
                 x,
                 exponentiate = TRUE,
                 conf.level = 0.95,
                 conf.int = TRUE,
                 vcov = cov
               )
               ) 
  # get the results of the wald test
tab %>% add_global_p(anova_fun = tidy_wald_test)
tab %>%  add_global_p(anova_fun = tidy_wald_test2)
@ddsjoberg
Copy link
Owner

Added in #2077

@ddsjoberg
Copy link
Owner

Hey @aghaynes ! Thanks again for the contribution to make to make it easier for robust standard errors!

Would you be open to another contribution? I think an example in our FAQ article illustrating how to use robust standard errors using gtsummary::tidy_robust() for the regular p-value from the model and gtsummary::tidy_wald(vcov) for the variable p-values? If possible, do you think we could make this happen using only packages already listed in the DESCRIPTION file?

What do you think?
https://www.danieldsjoberg.com/gtsummary/articles/gallery.html

@aghaynes
Copy link
Contributor Author

aghaynes commented Dec 2, 2024

Hey @ddsjoberg! Great idea. I'll have a look. Maybe there's a suitable dataset in aod. For it to be useful, it would probably be necessary to actually see a change in the p-value (and CI?), which isn't always the case with these adjustments (they're often only in the decimal places somewhere).

@ddsjoberg ddsjoberg reopened this Dec 2, 2024
@ddsjoberg
Copy link
Owner

ddsjoberg commented Dec 2, 2024

Awesome, thank you! If {aod} doesn't have something, I sometimes use this as an example for correlated data

gtsummary::trial |> 
  dplyr::mutate(subject_id = dplyr::row_number(), .by = trt)
#> # A tibble: 200 × 9
#>    trt      age marker stage grade response death ttdeath subject_id
#>    <chr>  <dbl>  <dbl> <fct> <fct>    <int> <int>   <dbl>      <int>
#>  1 Drug A    23  0.16  T1    II           0     0    24            1
#>  2 Drug B     9  1.11  T2    I            1     0    24            1
#>  3 Drug A    31  0.277 T1    II           0     0    24            2
#>  4 Drug A    NA  2.07  T3    III          1     1    17.6          3
#>  5 Drug A    51  2.77  T4    III          1     1    16.4          4
#>  6 Drug B    39  0.613 T4    I            0     1    15.6          2
#>  7 Drug A    37  0.354 T1    II           0     0    24            5
#>  8 Drug A    32  1.74  T1    I            0     1    18.4          6
#>  9 Drug A    31  0.144 T1    II           0     0    24            7
#> 10 Drug B    34  0.205 T3    I            0     1    10.5          3
#> # ℹ 190 more rows

Created on 2024-12-02 with reprex v2.1.1

which gives us a data set where each subject received both Drug A and Drug B (there trt column).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants