Using dplyr::group_by() with tbl_json objects changes the type of `..JSON` column #135

ramiromagno · 2021-02-06T03:35:38Z

Here ..JSON is of type character:

> tidyjson::worldbank %>% tidyjson::spread_all()
# A tibble: 500 x 9
   ..JSON       document.id boardapprovaldate  closingdate    countryshortname project_name                   regionname      totalamt `_id.$oid`    
   <chr>              <int> <chr>              <chr>          <chr>            <chr>                          <chr>              <dbl> <chr>         
 1 "{\"_id\":{…           1 2013-11-12T00:00:… 2018-07-07T00… Ethiopia         Ethiopia General Education Qu… Africa            1.30e8 52b213b38594d…
 2 "{\"_id\":{…           2 2013-11-04T00:00:… NA             Tunisia          TN: DTF Social Protection Ref… Middle East an…   0.     52b213b38594d…
 3 "{\"_id\":{…           3 2013-11-01T00:00:… NA             Tuvalu           Tuvalu Aviation Investment Pr… East Asia and …   6.06e6 52b213b38594d…
 4 "{\"_id\":{…           4 2013-10-31T00:00:… NA             Yemen, Republic… Gov't and Civil Society Organ… Middle East an…   0.     52b213b38594d…
 5 "{\"_id\":{…           5 2013-10-31T00:00:… 2019-04-30T00… Lesotho          Second Private Sector Competi… Africa            1.31e7 52b213b38594d…
 6 "{\"_id\":{…           6 2013-10-31T00:00:… NA             Kenya            Additional Financing for Cash… Africa            1.00e7 52b213b38594d…
 7 "{\"_id\":{…           7 2013-10-29T00:00:… 2019-06-30T00… India            National Highways Interconnec… South Asia        5.00e8 52b213b38594d…
 8 "{\"_id\":{…           8 2013-10-29T00:00:… NA             China            China Renewable Energy Scale-… East Asia and …   0.     52b213b38594d…
 9 "{\"_id\":{…           9 2013-10-29T00:00:… 2018-12-31T00… India            Rajasthan Road Sector Moderni… South Asia        1.60e8 52b213b38594d…
10 "{\"_id\":{…          10 2013-10-29T00:00:… 2014-12-31T00… Morocco          MA Accountability and Transpa… Middle East an…   2.00e8 52b213b38594d…
# … with 490 more rows

Now is of type list (scroll to the right):

> tidyjson::worldbank %>% tidyjson::spread_all() %>% dplyr::group_by(countryshortname)
# A tibble: 500 x 9
# Groups:   countryshortname [118]
   document.id boardapprovaldate  closingdate    countryshortname  project_name                     regionname      totalamt `_id.$oid`     ..JSON   
         <int> <chr>              <chr>          <chr>             <chr>                            <chr>              <dbl> <chr>          <list>   
 1           1 2013-11-12T00:00:… 2018-07-07T00… Ethiopia          Ethiopia General Education Qual… Africa            1.30e8 52b213b38594d… <named l…
 2           2 2013-11-04T00:00:… NA             Tunisia           TN: DTF Social Protection Refor… Middle East an…   0.     52b213b38594d… <named l…
 3           3 2013-11-01T00:00:… NA             Tuvalu            Tuvalu Aviation Investment Proj… East Asia and …   6.06e6 52b213b38594d… <named l…
 4           4 2013-10-31T00:00:… NA             Yemen, Republic … Gov't and Civil Society Organiz… Middle East an…   0.     52b213b38594d… <named l…
 5           5 2013-10-31T00:00:… 2019-04-30T00… Lesotho           Second Private Sector Competiti… Africa            1.31e7 52b213b38594d… <named l…
 6           6 2013-10-31T00:00:… NA             Kenya             Additional Financing for Cash T… Africa            1.00e7 52b213b38594d… <named l…
 7           7 2013-10-29T00:00:… 2019-06-30T00… India             National Highways Interconnecti… South Asia        5.00e8 52b213b38594d… <named l…
 8           8 2013-10-29T00:00:… NA             China             China Renewable Energy Scale-Up… East Asia and …   0.     52b213b38594d… <named l…
 9           9 2013-10-29T00:00:… 2018-12-31T00… India             Rajasthan Road Sector Moderniza… South Asia        1.60e8 52b213b38594d… <named l…
10          10 2013-10-29T00:00:… 2014-12-31T00… Morocco           MA Accountability and Transpare… Middle East an…   2.00e8 52b213b38594d… <named l…
# … with 490 more rows

The text was updated successfully, but these errors were encountered:

colearendt · 2021-02-06T11:19:27Z

Thanks for reporting!! That is surprising behavior indeed. I'll take a look and see if this can be improved. To be fair, ..JSON is actually always a list, but it is a "hidden" column, so we just print a character overview of what it looks like in JSON form.

I suspect the problem that is happening here is that the "grouped data frame" is no longer a tbl_json. Do you plan to do anything else with the JSON data after doing the group_by operations? It'd be good to get a sense for the workflow you are shooting for here!

ramiromagno · 2021-02-06T15:28:03Z

Here's some more context:

#' @importFrom rlang .data
collect_samples <- function(tbl_json) {

  samples_variants <- tbl_json %>%
    tidyjson::enter_object('samples_variants') %>%
    tidyjson::gather_array(column.name = 'sample_id') %>%
    dplyr::select(-'sample_id')

  samples_training <- tbl_json %>%
    tidyjson::enter_object('samples_training') %>%
    tidyjson::gather_array(column.name = 'sample_id') %>%
    dplyr::select(-'sample_id')

  all_samples <-
    tidyjson::bind_rows(samples_variants, samples_training) %>%
    dplyr::group_by(.data$..page, .data$array.index) %>%
    dplyr::mutate(., sample_id = seq_len(dplyr::n()), .after = 'array.index') %>%
    dplyr::arrange('sample_id', .by_group = TRUE) %>%
    dplyr::ungroup() %>%
    tidyjson::as.tbl_json(json.column = '..JSON') # Needed because of https://github.com/colearendt/tidyjson/issues/135.

  return(all_samples)
}

I found that using tidyjson::as.tbl_json(json.column = '..JSON') works as a workaround.

colearendt · 2021-02-06T15:31:30Z

That makes a lot of sense! Thank you for the context! Grouped mutates are perfect. One of the concerns we have for supporting grouped tibbles is "summarize" operations, which will necessarily destroy the JSON data and make future tidyjson operations largely meaningless. It'd be great if we could find a nice middle way that supports grouped tibbles for certain operations, or at least makes the confusing state here more clear / understandable.

…

On Feb 6 2021, at 10:28 am, Ramiro Magno ***@***.***> wrote: Here's some more context: #' @importFrom rlang .datacollect_samples <- function(tbl_json) { samples_variants <- tbl_json %>% tidyjson::enter_object('samples_variants') %>% tidyjson::gather_array(column.name = 'sample_id') %>% dplyr::select(-'sample_id') samples_training <- tbl_json %>% tidyjson::enter_object('samples_training') %>% tidyjson::gather_array(column.name = 'sample_id') %>% dplyr::select(-'sample_id') all_samples <- tidyjson::bind_rows(samples_variants, samples_training) %>% dplyr::group_by(.data$..page, .data$array.index) %>% dplyr::mutate(., sample_id = seq_len(dplyr::n()), .after = 'array.index') %>% dplyr::arrange('sample_id', .by_group = TRUE) %>% dplyr::ungroup() %>% tidyjson::as.tbl_json(json.column = '..JSON') # Needed because of #135. return(all_samples) } I found that using tidyjson::as.tbl_json(json.column = '..JSON') works as a workaround. — You are receiving this because you commented. Reply to this email directly, view it on GitHub (#135 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AFQBVVWYAPFBVYSJ25RY6HDS5VNZHANCNFSM4XF2JK5A).

ramiromagno · 2021-02-06T15:40:29Z

btw, I should probably create a separate issue, but something I see doing myself too often is drop a recently created index column with gather_array, like in the example above:

    tidyjson::gather_array(column.name = 'sample_id') %>%
    dplyr::select(-'sample_id')

you think it would be possible to allow that argument column.name to accept a special value that would automatically drop the index column?

ramiromagno · 2021-02-06T15:41:21Z

Perhaps?

    tidyjson::gather_array(column.name = NULL)

colearendt · 2021-02-06T16:18:32Z

Yeah, I like that idea!

…

Sent from my iPhone On Feb 6, 2021, at 10:41 AM, Ramiro Magno <[email protected]> wrote: Perhaps? tidyjson::gather_array(column.name = NULL) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#135 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFQBVVW5LATS4FY65GXZHMDS5VPLBANCNFSM4XF2JK5A>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using dplyr::group_by() with tbl_json objects changes the type of `..JSON` column #135

Using dplyr::group_by() with tbl_json objects changes the type of `..JSON` column #135

ramiromagno commented Feb 6, 2021

colearendt commented Feb 6, 2021 •

edited

Loading

ramiromagno commented Feb 6, 2021

colearendt commented Feb 6, 2021 via email

ramiromagno commented Feb 6, 2021

ramiromagno commented Feb 6, 2021

colearendt commented Feb 6, 2021 via email

Using dplyr::group_by() with tbl_json objects changes the type of ..JSON column #135

Using dplyr::group_by() with tbl_json objects changes the type of ..JSON column #135

Comments

ramiromagno commented Feb 6, 2021

colearendt commented Feb 6, 2021 • edited Loading

ramiromagno commented Feb 6, 2021

colearendt commented Feb 6, 2021 via email

ramiromagno commented Feb 6, 2021

ramiromagno commented Feb 6, 2021

colearendt commented Feb 6, 2021 via email

Using dplyr::group_by() with tbl_json objects changes the type of `..JSON` column #135

Using dplyr::group_by() with tbl_json objects changes the type of `..JSON` column #135

colearendt commented Feb 6, 2021 •

edited

Loading