-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access to layer information during build stages #3175
Comments
I definitely don't like adding the Also, it's not entirely clear to me where you would need this info. Can you provide some example scenarios? E.g., how would you like to implement #3062? It may help with thinking things through. |
Though I too want to see some example scenario. |
Ok, the prime example that provoked the issue is the idea of allowing layers to state whether they are fixed (should appear on every panel). The current situation where we must modify the data to not contain the faceting variable is often cumbersome (see #3062 for more information). In order to achieve this we'd need to pass a The next step is to properly assign I'm well aware that this could be solved for the faceting system quite easily with some changes, and the number of facet extensions are low so we could quite easily mitigate any breakage, but the issue of layer specific information has now come up twice in a very short time and I feel we should align on a general approach rather than keep duct taping it onto the current code. |
Thanks, I agree that it's very likely for us to face the need for making the same change as #3170 on another function, and your example seems the case. But, I'm just not sure if we can generalise it to all functions that uses layer data (probably because I don't understand the whole picture of the building stages). |
I think we need to be systematic and enumerate all the places where possibly layer parameters could be needed. I believe it's going to be geoms, facets, and stats. We have a solution for geoms that works and doesn't break any existing packages. I think the same solution could be implemented for stats. (And for stats, the I prefer the solution where we hand If we ever implement layer-specific color scales, I think they should be managed by the layer, so color scales wouldn't need to know about layers. Anything I'm forgetting? |
Just a quick comment... @thomasp85 's solution number two is not too bad to implement ( see master...paleolimbot:Issue-3175-layer-params-data-attr ), although it doesn't help with the facet issue (and it feels a little like duct tape). |
For completeness,
Does this make sense...? Probably, I'm still failing to see the scope of this discussion. If the above ggprotos are not the ones we need to consider here, please point out. |
However, all these objects are stateless, so I'm afraid we'd have to drill down to specific methods that will require layer_param. |
I'm afraid so too... |
To weight in on this discussion that has been dormant for the past 4 years; there is already a data structure that is excellent for keeping related data together without resorting to attributes or inventing a new class of objects. That structure is the good old dataframe. You could use list-columns for any non-atomic information you might want to put in there and have 1 row per layer. library(ggplot2)
# tibble for nice printing, vctrs::data_frame works just as well
tibble::tibble(
index = 1:2,
data = list(mtcars, mpg),
layer = list(geom_point(), geom_line()),
fixed = c(TRUE, FALSE)
)
#> # A tibble: 2 × 4
#> index data layer fixed
#> <int> <list> <list> <lgl>
#> 1 1 <df [32 × 11]> <LyrInstn> TRUE
#> 2 2 <tibble [234 × 11]> <LyrInstn> FALSE Created on 2023-11-13 with reprex v2.0.2 In theory, you could even make |
After some discussion with Thomas, we agreed that this issue is essentially a request for attribute stability of the (internal) data. AFAIK attributes are dropped/lost in the following cases:
|
I'm opening this issue as a more general discussion ground for the discussions going on in #3170 (which fixes #3116). The reason for this is that in my attempt to implement #3062 I've run into more or less the same issues but from the facet side, and I think the time is up for dealing without it in a more general way.
The problem: Throughout ggplot2's code base the layer data is the sole representative of the layer. It is not assumed that the layer have any other information of interest to share. This makes for clean code. We don't need to pass the layer objects around so we get nice separation of responsibility. But it also means that the layer cannot tell the outside world about any special property it has. Generally this has not been a problem as any specifics has been relevant to the geom and stat exclusively (part of the internal of the layer).
One solution: Take the approach given in #3170 and generalise it, so that any function receiving layer data also receives a
layer_param
object, as well as formalising setting up this object (probably withinsetup_layer()
). This will require sweeping changes to many methods across many classes (internally and in extension packages) so It's certainly less attractive because of that. On the other hand it will result in an architecture that is consistent (using a params object). We may be able to modify the ggproto dispatch to quietly ignore unknown arguments in order to avoid breaking extension packages but it would remove the safety of early failure when misspelling arguments in code.Another solution: Attenuate the data object with layer param attributes - this would allow everything to continue to work as it has done before, but will require internal changes to make sure the data keeps the information as it is passed around. A
layer_param
attribute could keep all relevant information that we would otherwise pass around as alayer_param
argument. This will undoubtedly result in some more ugly code though helper functions could probably tidy most of it up. Worst is probably that we now divert into two different ways of passing around information between objects.A kind of middle ground: Each layer data could also simply get an attribute (or an additional column) telling the index of the layer in the layer stack, and e.g. the facets could seep out layer information from all the layers during setup and then take care of matching that information with the data itself if needed. I'm not sure there's any benefit to this other than avoiding to put arbitrary amount of information in the data attributes. It would certainly result in even more housekeeping code than option two
I'm sure there are solutions I haven't considered so I'd be happy to take a broad discussion about how we want to solve it. I'm certain this is something we want to solve in general in order for development to move forward. (as an aside (can't find the issues right now), it will also allow for the layers to take additional params needed by extensions without triggering the
unknown argument
warning—I remember ggplotly requesting this at least.@hadley, @karawoo, @clauswilke, @yutannihilation looking forward to your input
The text was updated successfully, but these errors were encountered: