Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cells spanning multiple columns or rows #289

Open
tzakharko opened this issue May 25, 2023 · 5 comments
Open

Cells spanning multiple columns or rows #289

tzakharko opened this issue May 25, 2023 · 5 comments
Labels
feature a feature request or enhancement

Comments

@tzakharko
Copy link

We would like to access information about the cells spanning multiple columns or rows. From what I understand, this is implemented by reading the merges field from the Sheet object (https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets). I don't think this information is currently imported.

I might be interested in submitting a PR for this feature, maybe as an attribute or additional column (n_cols/n_rows) on the tibble returned by read_cells?

@jennybc
Copy link
Member

jennybc commented May 25, 2023

Yes, this is something I would like to deal with in googlesheets4, but have not gotten around to it.

Yes, range_read_cells() is definitely a major player here.

Another relevant function is new_googlesheets4_spreadsheet(). I suspect it will need to learn about merged cells.

As for a PR, I should pre-emptively state that this might be a feature that is big enough that I would want to do it myself. So you should just enter into the effort knowing that it might not get merged.

But a working PR is always useful. It would give you the capability, sooner rather than later, and would certainly surface the relevant questions and tricky bits. So if you want to have a go, you should (with the caveat above).

@jennybc jennybc added the feature a feature request or enhancement label May 25, 2023
@tzakharko
Copy link
Author

Thanks, that sounds fair. I just had a brief look and I think that the basic functionality can be achieved fairly easily via vctrs::vec_locate_matches. I'll try to sketch the basic implementation later when I have some time, then you can still decide how you want to proceed.

@jennybc
Copy link
Member

jennybc commented May 26, 2023

I think one of the harder questions is deciding how to expose merged cells in the user-facing functions. So deciding how the new awareness of merged cells shows up in the user interface.

@tzakharko
Copy link
Author

Attached is a basic patch for reading merged cell ranges in range_read_cells(). This adds two columns: n_cols and n_rows to the resulting cell tibble. I haven't touched location formatting or the documentation for this patch. If you are interested, I can make this into a full PR, with documentation updates and tests.

I think one of the harder questions is deciding how to expose merged cells in the user-facing functions. So deciding how the new awareness of merged cells shows up in the user interface.

Yes, absolutely. To be honest, I am unsure whether this should even be exposed at the higher interface level. If one wants to do it properly, this would need to be supported in the upstream tibbles/pillar, which means a massive overhaul across the entire tidyverse. Not to mention more than a few tough choices, as merged cells don't play well with the basic design of R data frames. In fact, I doubt that this kind of effort will pay off anyway, as any design is likely to introduce tradeoffs that will harm practical use cases. It's probably best to accept that spreadsheets and R use different data models, provide the basic mapping interface for the common case (as is done now), and give the advanced user the space to do advanced stuff if they need to. People who read complex sheets will need to process them anyway before they are useable in R (for example, in our project we also need to consider cell color and notes).

Thus, for googlesheets4, I would suggest to start by exposing merged cells on a lower API level. The default behavior which puts a value in the starting cell and leaves the remaining cells as NA is already sane. For advanced use range_read_cells() is already a great API entry. Probably would be also interesting to have a symmetrical range_write_cells() that uploads a range of cells (using some format as what the read cells returns). In fact, I think my team will need this kind of functionality anyway, so I'll have to implement it this way or another.

Looking forward to hearing your thoughts!

merged_cells.patch

@jennybc
Copy link
Member

jennybc commented May 28, 2023

Thanks for starting the experiments! Can you make this as a pull request (as opposed to a patch file)? That's a lot more natural for my workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants