-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move visualisation_matrix lower to make it more re-usable #44
Comments
Interesting idea. I’m not super familiar with FWIW, I already have a similar function called library(marginaleffects)
dat <- mtcars[, c("mpg", "gear", "hp", "am", "vs")]
dat$gear <- as.factor(dat$gear)
mod <- lm(mpg ~ hp + am, mtcars)
typical(model = mod, hp = seq(111, 120, by = 5))
typical(newdata = head(dat, 3))
AFAICT, the main differences are:
On the one hand, since I am already committed to All this to say that, from a conceptual perspective, I agree it would Thanks for the ping! |
Would it be worth it to allow passing variables via
Maybe in insight then? Though this function really manipulates data, it is meant to be used with models most of the time so it really falls between datawizard and insight |
Looks like the
I agree it’s often a good idea to make sure the functions are in the “right” package. But I’m also not sure if there’s enough immediate demand such that it would be worth much trouble to move things around. FWIW, I just added support for library(modelbased)
library(marginaleffects)
mod <- lm(mpg ~ factor(cyl) + hp, mtcars)
predictions(mod, newdata = visualisation_matrix(at = "cyl"))
#> type predicted std.error conf.low conf.high cyl mpg hp
#> 1 response 25.12392 1.368888 22.31988 27.92796 4 20.09062 146.6875
#> 2 response 19.15627 1.247190 16.60151 21.71102 6 20.09062 146.6875
#> 3 response 16.60307 1.278754 13.98366 19.22248 8 20.09062 146.6875
marginaleffects(mod, newdata = visualisation_matrix(at = "cyl"))
#> rowid type term contrast dydx std.error cyl mpg hp
#> 1 1 response cyl 6 - 4 -5.96765508 1.63927762 4 20.09062 146.6875
#> 2 2 response cyl 6 - 4 -5.96765508 1.63927762 6 20.09062 146.6875
#> 3 3 response cyl 6 - 4 -5.96765508 1.63927762 8 20.09062 146.6875
#> 4 1 response cyl 8 - 4 -8.52085075 2.32607494 4 20.09062 146.6875
#> 5 2 response cyl 8 - 4 -8.52085075 2.32607494 6 20.09062 146.6875
#> 6 3 response cyl 8 - 4 -8.52085075 2.32607494 8 20.09062 146.6875
#> 7 1 response hp -0.02403883 0.01540789 4 20.09062 146.6875
#> 8 2 response hp -0.02403883 0.01540790 6 20.09062 146.6875
#> 9 3 response hp -0.02403883 0.01540789 8 20.09062 146.6875 |
@vincentarelbundock when you look at
|
well, we could query the ellipsis it and if the user passed something that corresponds to the name of a variable in the dataframe, then use it. The only edgecase would be if someone has a variable called I guess it boils down to wether anyone (@strengejacke @mattansb ?) can think of any other usecase for a data-gridding beyond modelbased? wouldn't it be useful in insight I vaguely remember in the prediction's CIs computations we had to obtain the grid to get the covariance or something? perhaps it's a false memory I can blame that on long-covid |
btw, |
Can you write some example code or similar? I don't get the discussion about the ellipses, and why we should not move vis-mat to datawizard? |
Absolutely. In the dev version I now attach all the attributes of the library(modelbased)
library(marginaleffects)
m <- lm(mpg ~ hp + factor(cyl), mtcars)
p <- predictions(m, newdata = visualisation_matrix(at = "cyl"))
attributes(p) |> names()
#> [1] "names" "class" "row.names"
#> [4] "model" "type" "model_type"
#> [7] "variables" "newdata_adjusted_for" "newdata_at_specs"
#> [10] "newdata_at" "newdata_preserve_range" "newdata_table_title"
#> [13] "newdata_table_footer" |
@strengejacke, would you have an easy example on hand? Here’s what I got when I tried: library(brms)
library(insight)
library(modelbased)
library(marginaleffects)
mod <- insight::download_model("brms_1")
predictions(mod, newdata = visualisation_matrix(at = "cyl = c(2, 4, 6)"))
#> rowid type predicted cyl mpg wt conf.low conf.high
#> 1 1 response 26.32771 2 20.09062 3.21725 22.44373 29.81315
#> 2 2 response 23.35533 4 20.09062 3.21725 21.28254 25.37644
#> 3 3 response 20.35782 6 20.09062 3.21725 19.38795 21.27495 |
library(modelbased)
library(marginaleffects)
mod <- brms::brm(mpg ~ factor(cyl) + hp, data = mtcars, refresh = 0, iter = 200)
predictions(mod, newdata = visualisation_matrix(at = "cyl"))
#> Error: Variable 'cyl' was originally numeric but is not in 'newdata'. Created on 2022-01-23 by the reprex package (v2.0.1) |
I think Dominique brought up the ellipses in response to my code with
In terms of functionality, they're the same, so I am not advocating a change here, just noting the difference.
IMHO, it would make more sense in So the question is: is there enough demand to justify the effort? As explained above, I'm not sure I'll use it, since I already implemented something very similar, and people can now use |
I think this is most likely due to this here: easystats/modelbased#161 (comment) If |
If the discussion was about moving it to either insight or datawizard, the effort is the same. So I'd say it should go into datawizard. Furthermore, I suggest either having named vectors like in your |
Yes, the lazy choice would be to leave it in
One approach would be to avoid coercing data types (which feels unsafe in general) by checking the new
On aesthetic grounds, I'm not a big fan of this either, but backward compatibility should probably be maintained. |
Scratch my comment about |
Here I try to summarize the main problem in library(insight)
library(modelbased)
m1 <- lm(mpg ~ hp + factor(cyl), mtcars)
m2 <- lm(mpg ~ hp + cyl, mtcars)
# numeric, marked as factor
get_data(m1) |> attr("factors")
#> [1] "cyl"
class(get_data(m1)$cyl)
#> [1] "numeric"
# numeric, not marked as factor
get_data(m2) |> attr("factors")
#> NULL
class(get_data(m2)$cyl)
#> [1] "numeric"
# if cyl is numeric, but used as factor inside formula
visualisation_matrix(m1, at = "cyl")
#> Visualisation Grid
#>
#> cyl | hp
#> ------------
#> 4 | 146.69
#> 6 | 146.69
#> 8 | 146.69
#>
#> Maintained constant: hp
str(visualisation_matrix(m1, at = "cyl")$cyl)
#> Factor w/ 3 levels "4","6","8": 1 2 3
# if cyl is numeric
visualisation_matrix(m2, at = "cyl")
#> Visualisation Grid
#>
#> cyl | hp
#> -------------
#> 4.00 | 146.69
#> 4.44 | 146.69
#> 4.89 | 146.69
#> 5.33 | 146.69
#> 5.78 | 146.69
#> 6.22 | 146.69
#> 6.67 | 146.69
#> 7.11 | 146.69
#> 7.56 | 146.69
#> 8.00 | 146.69
#>
#> Maintained constant: hp
str(visualisation_matrix(m2, at = "cyl")$cyl)
#> num [1:10] 4 4.44 4.89 5.33 5.78 ...
# if "cyl" is treated as numeric in "visualisation_matrix()"
# it fails due to invalid levels
predict(m1, newdata = visualisation_matrix(m2, at = "cyl"))
#> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor factor(cyl) has new levels 4.44444444444444, 4.88888888888889, 5.33333333333333, 5.77777777777778, 6.22222222222222, 6.66666666666667, 7.11111111111111, 7.55555555555556
m3 <- brms::brm(mpg ~ hp + factor(cyl), mtcars, refresh = 0, iter = 100, chains = 2)
# if "cyl" is treated as factor in "visualisation_matrix()"
# it fails for brms because original cyl was numeric
predict(m3, newdata = visualisation_matrix(m3, at = "cyl"))
#> Error: Variable 'cyl' was originally numeric but is not in 'newdata'.
# --- solution ----
# if we create data grid with factor (to have "proper" at-values for cyl)
# and then convert "cyl" back to original numeric, it works for all models
vm_factor <- visualisation_matrix(m1, at = "cyl")
vm_factor$cyl <- insight:::.factor_to_numeric(vm_factor$cyl)
predict(m1, newdata = vm_factor)
#> 1 2 3
#> 25.12392 19.15627 16.60307
predict(m2, newdata = vm_factor)
#> 1 2 3
#> 25.04464 20.51526 15.98587
predict(m3, newdata = vm_factor)
#> Estimate Est.Error Q2.5 Q97.5
#> [1,] 25.96428 3.613558 19.36254 33.29545
#> [2,] 19.10267 3.321300 13.97888 25.33048
#> [3,] 16.15320 2.849919 10.43424 20.63559 Created on 2022-01-23 by the reprex package (v2.0.1) |
Regarding datawizard-insight-modelbased location, here are some of the arguments
I agree that evaluating as list is more intuitive, the reason for strings was to do some elaborate stuff like specifying ranges etc. (e.g., |
@ list-argument: this is implemented, additionally to current behaviour, so no breaking changes. |
That's what i had in mind too. Maybe we can think of adding some features? Are there have any cool and useful features in ggeffects & marginaleffect that we could try reproducing so that it's on par feature-wise? |
I think what could be useful are handy shortcuts to define the at-values, see https://strengejacke.github.io/ggeffects/articles/introduction_effectsatvalues.html. @mattansb mentioned what he uses often is something like |
I think relevant code that could be re-used is: and some of the functions used in that code: |
btw, this fancy code is protected by patent from me. :-D (it calculates the perfect range for prettifying values) |
A robust solution to generate the data / visualisation grid could be useful to other developers/packages as well (@vincentarelbundock), so we might want to see whether to move it to datawizard or insight. We could rename it something like
data_grid()
orget_datagrid()
(if in insight), and reexport it under its current name (visualization_matrix) in modelbased to avoid any breaking changes.What do you think?
The text was updated successfully, but these errors were encountered: