-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
visualisation_recipe design #110
Comments
I guess I don't really see this as meaningfully more accessible than writing the ggplot code originally. If anything, I obscuring the variable names probably makes writing the final plotting code more difficult for new coders. (As an aside, I think most of the problems new ggplot2 users have stems from approaching it as a copy-paste recipe task versus taking the time to understand the layer-building logic underlying the syntax. I've found my students have found ggplot fairly intutive once I give a 15 minute walkthrough of the principles underlying it.) I guess my preference here would be to follow the general idea of see in that just passing the result of a modelbased function to |
Copied from the other issue: I think a basic approach for
Then, we apply nice see defaults. If a user wants more customization, they can take the resulting ggplot and add more layers. |
Somehow I didn't convey the fact that my "recipe approach" to plotting was first and foremost to facilitate the implementation of the plots, so that it is more robust, readable and debuggable. That is its first advantage beyond others like flexibility, didactic value and customizability. You will find below are a more complex example of how it could look like for (Note: currently the plotting only does lines (which might not be appropriate when only factors / factor is the first predictor) - but this we can address once we have the plot for estimate_means which we could call in some cases to plot the output of estimate_predicted) Note2: the call to library(modelbased)
library(see)
# Simple ------------------------------------------------------------------
# Default
x <- estimate_relation(lm(mpg ~ wt, data = mtcars))
layers <- visualisation_recipe(x)
layers
#> Layer 1
#> --------
#> Geom type: point
#> data, aes, stroke, shape
#>
#> Layer 2
#> --------
#> Geom type: ribbon
#> data, aes, alpha
#>
#> Layer 3
#> --------
#> Geom type: line
#> data, aes
#>
#> Layer 4
#> --------
#> Geom type: labs
#> x, y, title
plot(layers) # Customize aesthetics
x <- estimate_relation(lm(mpg ~ wt, data = mtcars))
layers <- visualisation_recipe(x,
points = list(color = "red", alpha = 0.6, size = 3),
line = list(color = "blue", size = 4),
ribbon = list(fill = "green", alpha = 0.7),
labs = list(subtitle = "Oh yeah!"))
layers
#> Layer 1
#> --------
#> Geom type: point
#> data, aes, stroke, shape, color, alpha, size
#>
#> Layer 2
#> --------
#> Geom type: ribbon
#> data, aes, alpha, fill
#>
#> Layer 3
#> --------
#> Geom type: line
#> data, aes, color, size
#>
#> Layer 4
#> --------
#> Geom type: labs
#> x, y, title, subtitle
plot(layers) # 2-ways interaction ------------------------------------------------------------
# Numeric * numeric
x <- estimate_relation(lm(mpg ~ wt * qsec, data = mtcars))
layers <- visualisation_recipe(x)
plot(layers) # Factor * numeric
x <- estimate_relation(lm(Sepal.Width ~ Species * Sepal.Length, data = iris))
plot(x) # 3-ways interaction ------------------------------------------------------------
data <- mtcars
data$vs <- as.factor(data$vs)
data$cyl <- as.factor(data$cyl)
data$new_factor <- as.factor(rep(c("A", "B"), length.out = nrow(mtcars)))
# Numeric * numeric * numeric
x <- estimate_relation(lm(mpg ~ wt * qsec * hp, data = data))
plot(x) # Numeric * numeric * factor
x <- estimate_relation(lm(mpg ~ wt * am * vs, data = data))
plot(x) # Numeric * factor * factor
x <- estimate_relation(lm(mpg ~ wt * cyl * new_factor, data = data))
plot(x) Created on 2021-05-31 by the reprex package (v1.0.0) |
I am worried about two things now:
|
library(modelbased)
library(see)
plot(estimate_relation(lm(mpg ~ wt, data = mtcars))) +
see::theme_modern() Created on 2021-05-31 by the reprex package (v1.0.0) do you have an example of something that could not work? If the issue is the missing "main" data (and "main" aesthetics?), it could be easily added I think
|
I think the recipe approach is on the contrary much more safer than previous ones, because all the elements remain as data (specifically, lists) until the very end, til it uses |
I think I misread something earlier. Didn't realize you were using aes_string(). |
Essentially the other approach (used in other places in |
Regading further extension by users, what about: What about x <- estimate_relation(lm(mpg ~ wt * cyl * new_factor, data = data))
plot(x) + scale_color_viridis_d() and x <- estimate_relation(lm(mpg ~ wt * cyl * new_factor, data = data))
plot(x) + geom_ribbon(aes(ymin = CI_low, ymax = CI_high)) or x <- estimate_relation(lm(mpg ~ wt * cyl * new_factor, data = data))
plot(x) + geom_ribbon(aes(ymin = CI_low, ymax = CI_high)) (the last one what I want to know is can a user add another geom without issue?) |
I'm not really following. If you Reduce(`+`, list_of_geoms) |
library(modelbased)
library(ggplot2)
data <- mtcars
data$vs <- as.factor(data$vs)
data$cyl <- as.factor(data$cyl)
data$new_factor <- as.factor(rep(c("A", "B"), length.out = nrow(mtcars)))
x <- estimate_relation(lm(mpg ~ wt * cyl * new_factor, data = data))
plot(x) + scale_color_viridis_d() x <- estimate_relation(lm(mpg ~ wt * cyl * new_factor, data = data))
plot(x) + geom_ribbon(aes(ymin = CI_low, ymax = CI_high))
#> Error in FUN(X[[i]], ...): object 'CI_low' not found
x <- estimate_relation(lm(mpg ~ wt * cyl * new_factor, data = data))
plot(x) + geom_ribbon(aes(ymin = CI_low, ymax = CI_high))
#> Error in FUN(X[[i]], ...): object 'CI_low' not found Created on 2021-05-31 by the reprex package (v1.0.0) The two last don't work as expected. We could indeed add the main object as main data, let me try |
library(modelbased)
library(ggplot2)
data <- mtcars
data$vs <- as.factor(data$vs)
data$cyl <- as.factor(data$cyl)
data$new_factor <- as.factor(rep(c("A", "B"), length.out = nrow(mtcars)))
x <- estimate_relation(lm(mpg ~ wt * cyl * new_factor, data = data))
plot(x) + geom_ribbon(aes(x = wt, y = Predicted, ymin = CI_low, ymax = CI_high, fill = cyl)) Created on 2021-05-31 by the reprex package (v1.0.0) It works "in principle" now that the main data is passed to |
I think it's okay so long as the main plot has data. |
Side note: Instead of x_name <- "varX"
ggplot2::aes(x = .data[[x_name]])
#> Aesthetic mapping:
#> * `x` -> `.data[["varX"]]` Created on 2021-05-31 by the reprex package (v2.0.0) |
I like I made a first draft for estimate_means: library(magrittr)
lm(Sepal.Width ~ Species, data = iris) %>%
modelbased::estimate_means() %>%
plot() lm(Sepal.Width ~ Species, data = iris) %>%
modelbased::estimate_means() %>%
plot(jitter = list(width = 0.03, color = "red")) Created on 2021-05-31 by the reprex package (v1.0.0) |
# Fully custom using layers
library(modelbased)
library(ggplot2)
library(see)
means <- modelbased::estimate_means(lm(Sepal.Width ~ Species, data = iris))
d <- modelbased::visualisation_recipe(means)
d
#> Layer 1
#> --------
#> Geom type: jitter
#> data, aes, stroke, shape, width
#>
#> Layer 2
#> --------
#> Geom type: line
#> data, aes
#>
#> Layer 3
#> --------
#> Geom type: pointrange
#> data, aes
#>
#> Layer 4
#> --------
#> Geom type: labs
#> x, y, title
ggplot() +
# Collect elements from layer 1 and do something else
geom_violin(data = d$l1$data, aes_string(x = d$l1$aes$x, y = d$l1$aes$y)) +
# Only plot a subset of layers
see::geoms_from_list(d[c("l3", "l4")]) Created on 2021-05-31 by the reprex package (v1.0.0) I think it's a really flexible and easy-to-program-with solution |
I added support for two factors and its modulation by a numeric: library(modelbased)
library(see)
data <- mtcars
data$cyl <- as.factor(data$cyl)
data$new_factor <- as.factor(rep(c("A", "B"), length.out = nrow(mtcars)))
# Two factors ---------------
model <- lm(mpg ~ new_factor * cyl * wt, data = data)
x <- estimate_means(model, levels = c("new_factor", "cyl"))
#> NOTE: Results may be misleading due to involvement in interactions
plot(visualisation_recipe(x)) # Modulations --------------
x <- estimate_means(model, levels = c("new_factor"), modulate = "wt")
#> NOTE: Results may be misleading due to involvement in interactions
plot(visualisation_recipe(x)) x <- estimate_means(model, levels = c("new_factor", "cyl"), modulate = "wt")
plot(visualisation_recipe(x)) Created on 2021-05-31 by the reprex package (v1.0.0) There's a slight problem with the dodging of the lines / pointrange, not sure what's going on, feel free to check out the And in general, feel free to comment / roast / improve; the code is pretty straightforward so it should be easy to add options and all! |
I added the possibility of changing the display of raw data: library(modelbased)
x <- estimate_means(lm(Sepal.Width ~ Species, data = iris))
layers <- visualisation_recipe(x, show_data = c("violin", "boxplot", "points"))
plot(layers) Created on 2021-06-01 by the reprex package (v1.0.0) But it would be even more awesome if we had a super raincloud-like geom (easystats/see#135) |
Similarly, for large data in which having points might not be the best representation, now is added the possibility of 2D density geoms instead (or in combination with): library(modelbased)
x <- estimate_relation(lm(mpg ~ wt, data = mtcars))
plot(x, show_data = c("density_2d", "points")) plot(x, show_data = "density_2d_filled") plot(x, show_data = "density_2d_polygon") plot(x, show_data = "density_2d_raster", line = list(color = "white"), ribbon = list(fill = "white")) Created on 2021-06-01 by the reprex package (v1.0.0) |
I must admit I'm quite happy about how the visualisation for modelbased is shaping up 😁 , it's quite convenient! The problem is that now modelbased depends on the latest |
I'm a little confused. Does this plot code live in see or in modelbased? |
because the creation of the visualisation ingredients (i.e., the layers) is independent from rendering and only necessitates some code logic, it is present in modelbased, close to the main functions so that it's always up-to-date. (That also means that, in principle, users can use modelbased to generate the plotting scheme, and then render it with whatever alternative option they want - like some Python GG module because reasons). That said, the default plotting that most users will go with requires So the code to design the plot lives in modelbased, but the code to render it in see |
I think this new update with all of its changes, especially the ones regarding visualization, require a blog post... I, for one, have no idea how to use any of it! |
the blogpost's whole content could be: "you can now run |
Does modelbased depend on see, or is it only required for plotting? Else, I would include that code and it starts working once see is updated |
required only (conditional) - you mean add the geoms_from_list functions to modelbased? The only call to see/ggplot2 is here: modelbased/R/visualisation_recipe.R Lines 31 to 37 in 1410133
|
Sorry, confused. You're using a new function in see, that doesn't work of course. I would disable that code for now |
We could add minimum version argument to check_if_installed() |
Yes, but I think that calling a non-existent function like |
I think if the call is inside a conditional, it should work? |
For now cannot use the new check_if_installed with the dependency arg because otherwise it would need me to require the newest version of insight 😅 anyway, since to works on winbuilders as is, I'll submit as soon as the tests (#115) are dealt with, and we'll see |
Closing this, will start adding vignettes/ examples / posts to showcase all this once see is updated |
Follow up of easystats/see#38 (note that the scope of that proposition is broader than just the plot for estimate_predicted).
To summarize, plotting is one of the fundamental skill of data science, often overlooked in favour of results values (them sweetsweet p-values 🤤). The grammar of graphics offers an incredible amount of freedom of expression, creativity, and composability and for me is the best approach to plotting, of the ones that I tried.
That said, understanding how to think in terms of GG is a hard process, as (especially in psychology) we are usually taught to think in terms of types of plots and one-to-one correspondence with some statistical models (e.g., correlation - scatterplot, anovas - bar chart 🤮, t-test - boxplot etc.) rather than in terms of layers of geometries coupled with data.
On top of that inherent steep path to having the GG-mindset, its main implementation in R (ggplot2) adds its own layer (no pun intended) of complexity with a lot of new different argument names and idiosyncratic logic.
This double complexity makes it hard for students and beginners, and it could explain the creation (and popularity) of pre-baked plots creators, such as
see
orggstatsplot
. Don't get me wrong, these pre-baked plots are absolutely necessary, not just to help beginners to get stuff done but also for advanced users: I don't see myself re-writing the ggplot code forcheck_model
plots at each model 😁But the limitation that I see from that landscape is that there are two opposed paths to plots creation: either you build everything from the ground-up using geoms (which really makes you think about how and what to put into), or you get a plot done for you (which really doesn't make you think and doesn't necessarily show you how and what to put into).
Having been immersed within a Buddhist environment, it is not surprising that I'm longing for a 3rd noble path that would achieve some form balance 😅, for instance by exposing and getting access to the elements directly underlying a given plot.
One of the early attempts at something like that was the
data_plot()
andhow_to_plot()
methods insee
. The goal of the former was to transform the input into the data that would be plotted, and the second was to expose the code used to make the plot, so that users can copy and edit and this way learn how we make our plots. But this didn't really go anywhere, in part because:how_to_plot()
function that would print its internal code.estimate_predicted()
, the plot has a line and a ribbon, and eventually some points on top of that, not necessarily coming from the same data. So that was problematic fordata_plot()
A plot, within the GG framework, is made of layers. Each layers has information pertaining to the form (the geom), aesthetics, data etc. So I thought, why not having that as data, instead of as a plot.
In other words, why not have an ensemble of lists (layers) that would each contain the relevant stuff. That would make it easy for people to get a sense of how a plot is made (and how to make it, customize it etc), and at the same time it would make it easier for a third-party such as
see
to simply "render" this information (e.g., loop through the layers and programmatically create them and add them).Here's a first example of
visualisation_recipe()
:Created on 2021-05-30 by the reprex package (v1.0.0)
The text was updated successfully, but these errors were encountered: