Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Tests for print.tbl_json #99

Open
colearendt opened this issue May 22, 2017 · 11 comments
Open

Failing Tests for print.tbl_json #99

colearendt opened this issue May 22, 2017 · 11 comments

Comments

@colearendt
Copy link
Owner

At present, tests are failing for print.tbl_json due to tidyverse/tibble#259 since the new tibble release, I presume.

print(as.tbl_json('"a"'))         
#> # A tbl_json: 1 x 1 tibble with a "JSON" attribute
#>   `attr(., "JSON")` document.id
#>               <chr>       <int>
#> 1           "\"a\""           1
@krlmlr
Copy link

krlmlr commented May 22, 2017

I'm not familiar with this package, but I wonder why as.tbl_json() keeps the quotes. Would it be an option to use a list to store data of varying type?

@colearendt
Copy link
Owner Author

colearendt commented May 22, 2017

Oh yes, the JSON data is stored in a list at attr(., 'JSON'), but the print.tbl_json method presents the quotes in a column because of JSON's syntax that heavily uses double quotes (for strings, keys, etc.). The column is manufactured in the print method for easy viewing of the raw JSON along with how it relates to the tbl_df. Perhaps a more illustrative example of the problem in this context:

json <- '{"first": "frodo", "title": "hobbit"}'
                                               
tj <- as.tbl_json(json)                        
                                               
print(tj)                                      
#> # A tbl_json: 1 x 1 tibble with a "JSON" attribute
#>         `attr(., "JSON")` document.id
#>                     <chr>       <int>
#> 1 "{\"first\":\"frodo..."           1
                                               
print(attr(tj,'JSON'))                         
#> [[1]]
#> [[1]]$first
#> [1] "frodo"
#> 
#> [[1]]$title
#> [1] "hobbit"

The issue I opened with tibble is really a question of whether tidyjson should change it's print method to accommodate the new parsing behavior, or if the legacy behavior is preferable throughout. The previous print method was much cleaner. Also included an example of the use of the package to build a simple tibble:

library(tidyjson)

json <- "{\"first\": \"frodo\", \"title\": \"hobbit\"}"

tj <- as.tbl_json(json)

print(tj)
#> # A tbl_json: 1 x 1 tibble with a "JSON" attribute
#>    `attr(., "JSON")` document.id
#>                <chr>       <int>
#> 1 {"first":"frodo...           1

## Using the package
tj %>% spread_all()
#> # A tbl_json: 1 x 3 tibble with a "JSON" attribute
#>    `attr(., "JSON")` document.id first  title
#>                <chr>       <int> <chr>  <chr>
#> 1 {"first":"frodo...           1 frodo hobbit

tj %>% gather_object %>% append_values_string
#> # A tbl_json: 2 x 3 tibble with a "JSON" attribute
#>   `attr(., "JSON")` document.id  name string
#>               <chr>       <int> <chr>  <chr>
#> 1           "frodo"           1 first  frodo
#> 2          "hobbit"           1 title hobbit

@krlmlr
Copy link

krlmlr commented May 22, 2017

Maybe define a simple json_str class with a custom format() method?

@colearendt
Copy link
Owner Author

Ahh interesting - that makes sense. So I expect that would override the print.tbl_df's handling of the column, were it of that class?

@krlmlr
Copy link

krlmlr commented May 22, 2017

I guess so. With the upcoming colformat you could even add color.

@colearendt
Copy link
Owner Author

Wow. Very nice. I will explore - much appreciated! I like that solution better than parsing the output of print.tbl_df after using capture.output. Of course, I am not the package maintainer, so I cannot speak for him.

@colearendt
Copy link
Owner Author

@krlmlr Any thoughts/direction on the proposed implementation? I read through utils-format.R for tibble and it does not seem that the format method of any column is called, so I am not quite sure how it would be possible to do as you suggested. For list-columns, the list is printed with type_sum or obj_sum. I had a bit of luck defining type_sum.json_str, but double quotes are still escaped.

j <- structure('{"a":2, "b":4}', class='json_str')
format.json_str <- function(x, ...) {             
return(stringr::str_replace_all(x,'"','Z'))       
}                                                 
                                                  
## works... not in the tibble                     
format(j)                                         
#> [1] "{ZaZ:2, ZbZ:4}"
data_frame(a=j)                                   
#> # A tibble: 1 x 1
#>                      a
#>                  <chr>
#> 1 "{\"a\":2, \"b\":4}"

## more promising
type_sum.json_str <- function(x) { return(stringr::str_replace_all(x,'"','Z'))}
data_frame(a=list(j))
# # A tibble: 1 x 1
# a
# <list>
#   1 <{ZaZ:2, ZbZ:4}>

## still does not work with quotes
type_sum.json_str <- function(x) { return(stringr::str_replace_all(x,'\\"','z"z'))}
data_frame(a=list(j))
# # A tibble: 1 x 1
# a
# <list>
#   1 "<{z\"zaz\"z:2, z\"zbz\"z:4}>"

I attempted defining a method for is.character to see if I could pretend a json_str was not a character, but no luck there either. As far as I can tell, I don't see a way to cleverly dispatch a custom format method.

Error in genericForBasic(name) : 
  methods may not be defined for primitive function ‘is.character’ in this version of R

Parsing the capture.output may be the cleaner solution. Just need a better replace strategy to get the spaces back.

capture.output(print(data_frame(a='{"a":2, "b":4}'))) %>%
stringr::str_replace_all('\\\\+"','"') %>%           
writeLines()                                             
#> # A tibble: 1 x 1
#>                      a
#>                  <chr>
#> 1 "{"a":2, "b":4}"

@krlmlr
Copy link

krlmlr commented May 23, 2017

format.data.frame() will call format() on the individual columns. This is how hms and many other classes work.

Would you like to document the steps that were needed to create custom output for a tibble column in a tibble vignette?

@krlmlr
Copy link

krlmlr commented May 23, 2017

Note that you also need [.json_str, otherwise the head() call in trunc_mat() strips off the class. The following will do:

`[.json_str` <- function(x, ...) { structure(NextMethod(), class = "json_str") }

But even then the tibble formatting routine will reformat your data (even if you wrap it in a list), and this was the main point of your post. We'll need to find a better way here, sorry I led you to the wrong path. The new colformat is a way towards it, but it might take some time to implement.

I'm happy to add an experimental S3 method (subject to change) for formatting a column, or to review a PR that adds one. Eventually this method may move to the colformat package, or colformat may be integrated into tibble. Users of tidyjson will see escaped output in the CRAN version but better output in the development versions.

Thanks for your input.

@colearendt
Copy link
Owner Author

My pleasure, and thank you for your input as well! That is quite helpful direction - I will take a look and see if/how I am able to help. I am happy to contribute a vignette to describe the functionality as well. It sounds like a very promising feature for tibble to be extensible in this way.

@colearendt
Copy link
Owner Author

@krlmlr Pretty naive implementation on this branch. I attempted to use colformat, although I can see that the API is not really ready for that type of use. Though probably not worth a PR, I thought it might be worth sharing in case the functionality is worth exploring as colformat is developed.

The implementation is explored and some pain points are mentioned in the colformat vignette. Not sure whether there is anything worth opening as an issue / enhancement request to ensure that the thoughts are retained.

Would be interested to hear your feedback! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants