Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for another kind of stacking? #2006

Open
ddsjoberg opened this issue Sep 29, 2024 · 2 comments
Open

Options for another kind of stacking? #2006

ddsjoberg opened this issue Sep 29, 2024 · 2 comments
Milestone

Comments

@ddsjoberg
Copy link
Owner

Rather than using the row headers in gt (and for the other print engines, just adding a new column to the left), I would like to be able to stack tables and have the individual tables indented with the headers.

I also want this somehow integrated with tbl_strata() for stacking, but haven't thought through all those details yet.

@ddsjoberg
Copy link
Owner Author

Maybe if we added a new function (something like tbl_nested_stack()), and the option could easily be added an a combine method in tbl_strata()

@dereksonderegger
Copy link
Contributor

dereksonderegger commented Oct 10, 2024

In the following, I'm thinking about creating a grid of tables, each already containing a by split...

There are two approaches to merging/stacking:

  1. Independently create a bunch of tables and then merge/stack them. In this case, the function has to check and align column and row information in case the subtables are somehow different, e.g. a by level that was present in one subtable and not the other (possible if by was a character string and not a factor). This is particularly frustrating when both merging and stacking. If one of the subtables didn't have any data, then we are in real trouble because gtsummary::tbl_summary() won't produce a completely empty table suitable to take up that cell space in the subsequent grid.

  2. In tbl_ard_summary() have a merge= and stack= options that take one or more of the groups variables. The nice thing about this is that we get to see all of the data and determine what grouping levels are present and then then when the grid of merged/stacked tables is created, any grid cell with no data can be filled in with missing statistics. It would make sense to create an cards::ard_expand() function that just made sure all combinations of grouping variables are created, inserting n=0,N=0 and NAs for the statistics. Then we could merge and stack knowing that there aren't any missing grid cells.

For my own use, I've written a wrapper around tbl_summary(), tbl_merge(), and tbl_stack() to automate the first approach. This function has by, merge, and stack parameters but unfortunately bombs out in the edge case where some combination of merge and stack levels has no data. This has been particularly useful for quickly creating tables for subgroup analyses in Clinical Research (e.g. by=treatment_method, merge=sex, stack=region to look at the treatment effect by sex across different world regions). This could crash out if for some reason I didn't have any female subjects in some region. This happens a lot during the early enrollment time in a clinical study because the subject numbers are still small.

Ultimately I can see use cases for both approaches and would love to see both implemented.

As always, gtsummary is amazing and I'm thrilled to see you working on a package that takes the ARD structure and makes it easy to do the formatting and then spread the formatted statistics out into a structured table.

Just in case my discussion of a grid cell with missing data isn't clear enough, I've added an example...

library(tidyverse)

data <- palmerpenguins::penguins |>
  filter(!is.na(sex)) |>
  filter( !(species=='Chinstrap' & year==2007) ) |> # remove a group
  select(body_mass_g, sex, species, island, year)


# Adelie penguins are on all three islands, so there will be 3 columns
A_2007 <- gtsummary::tbl_summary(
  data |> 
    filter(species == 'Adelie', year == 2007) |>
    select(body_mass_g, sex, island), 
  by=island
)
A_2008 <- gtsummary::tbl_summary(
  data |> 
    filter(species == 'Adelie', year == 2008) |>
    select(body_mass_g, sex, island), 
  by=island
)

# Chinstrap penguins are only on Dream island.
# 
# This blows up so in my looping, I have to double check if there is data
# C_2007 <- gtsummary::tbl_summary(
#   data |> 
#     filter(species == 'Chinstrap', year == 2007) |>
#     select(body_mass_g, sex, island), 
#   by=island
# )
C_2008 <- gtsummary::tbl_summary(
  data |>
    filter(species == 'Chinstrap', year == 2008) |>
    select(body_mass_g, sex, island),
  by=island
)


# Gentoo's are only on Biscoe Island
G_2007 <- gtsummary::tbl_summary(
  data |> 
    filter(species == 'Gentoo', year == 2007) |>
    select(body_mass_g, sex, island), 
  by=island
)
G_2008 <- gtsummary::tbl_summary(
  data |> 
    filter(species == 'Gentoo', year == 2008) |>
    select(body_mass_g, sex, island), 
  by=island
)


R_2007 <- gtsummary::tbl_merge( 
  list(A_2007,G_2007), 
  tab_spanner=c('Adelie','Gentoo') )
R_2008 <- gtsummary::tbl_merge( 
  list(A_2008, C_2008, G_2008), 
  tab_spanner=c('Adelie','Chinstrap','Gentoo') )

# Now the stack is all messed up even ignoring the N counts
gtsummary::tbl_stack( 
  list(R_2007, R_2008),                    
  group_header=c('2007','2008'))
#> Column headers among stacked tables differ. Headers from the first table are
#> used.
#> ℹ Use `quiet = TRUE` to suppress this message.
Characteristic Adelie Gentoo
Biscoe
N = 10
1
Dream
N = 19
1
Torgersen
N = 15
1
Biscoe
N = 33
1
Dream
N = 0
1
Torgersen
N = 0
1
Biscoe
N = 45
1
Dream
N = 0
1
Torgersen
N = 0
1
2007
body_mass_g 3,700 (3,400, 3,800) 3,550 (3,300, 4,150) 3,700 (3,450, 4,200) 5,050 (4,650, 5,550) NA (NA, NA) NA (NA, NA)


sex








    female 5 (50%) 9 (47%) 8 (53%) 16 (48%) 0 (NA%) 0 (NA%)


    male 5 (50%) 10 (53%) 7 (47%) 17 (52%) 0 (NA%) 0 (NA%)


2008
body_mass_g 3,650 (3,350, 4,050) 3,650 (3,450, 4,200) 3,850 (3,575, 4,175) NA (NA, NA) 3,750 (3,500, 4,100) NA (NA, NA) 5,000 (4,700, 5,400) NA (NA, NA) NA (NA, NA)
sex








    female 9 (50%) 8 (50%) 8 (50%) 0 (NA%) 9 (50%) 0 (NA%) 22 (49%) 0 (NA%) 0 (NA%)
    male 9 (50%) 8 (50%) 8 (50%) 0 (NA%) 9 (50%) 0 (NA%) 23 (51%) 0 (NA%) 0 (NA%)
1 Median (Q1, Q3); n (%)

Created on 2024-10-10 with reprex v2.1.1

@ddsjoberg ddsjoberg added this to the v2.1.0 milestone Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants