-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
marginaleffects::marginal_means
deprecation
#246
marginaleffects::marginal_means
deprecation
#246
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #246 +/- ##
==========================================
- Coverage 35.10% 34.00% -1.11%
==========================================
Files 25 24 -1
Lines 1168 1147 -21
==========================================
- Hits 410 390 -20
+ Misses 758 757 -1 ☔ View full report in Codecov by Sentry. |
Thanks Vincent, I've been finally experimenting a bit with the marginaleffects API, so I think I'm in a better position now to push do the overhaul that modelbased needs to depend on the latest marginaleffects |
Sounds fantastic! Don't hesitate to ping me if you need input, review, or code snippets. |
Hi @vincentarelbundock I'm giving this a shot, I'd like to make marginaleffects::predictions() work with insight::get_datagrid(), could you let me know if it's possible to, in the following example, somehow specify to model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)
grid <- insight::get_datagrid(model, at="Species")
grid$Sepal.Width <- NULL
grid
#> Species
#> 1 setosa
#> 2 versicolor
#> 3 virginica
marginaleffects::predictions(model,
newdata=grid,
by="Species") Created on 2024-02-11 with reprex v2.0.2 tldr; what is the best way to obtain marginal means at species levels marginalized over Sepal.width? |
What do you mean specifically by "marginalized over Sepal.Width"? |
In addition, you get the For example, you can marginalize over a balanced grid like this: library(marginaleffects)
model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)
grid <- expand.grid(
Sepal.Width = unique(iris$Sepal.Width),
Species = unique(iris$Species)
)
predictions(model,
newdata = grid,
by = "Species")
#
# Species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
# setosa 1.44 0.0638 22.6 <0.001 374.6 1.32 1.57
# versicolor 4.62 0.0930 49.7 <0.001 Inf 4.44 4.80
# virginica 5.71 0.0668 85.5 <0.001 Inf 5.58 5.84
#
# Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
# Type: response Or you can marginalized over the empirical distribution of the actual data like the following (this is equivalent to the “cells” weights in predictions(model,
newdata = iris,
by = "Species")
#
# Species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
# setosa 1.46 0.0545 26.8 <0.001 524.4 1.36 1.57
# versicolor 4.26 0.0545 78.2 <0.001 Inf 4.15 4.37
# virginica 5.55 0.0545 101.9 <0.001 Inf 5.45 5.66
#
# Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
# Type: response |
Say I want to reproduce the model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)
emmeans::emmeans(model, "Species")
#> NOTE: Results may be misleading due to involvement in interactions
#> Species emmean SE df lower.CL upper.CL
#> setosa 1.43 0.0766 144 1.28 1.58
#> versicolor 4.50 0.0742 144 4.35 4.65
#> virginica 5.61 0.0563 144 5.50 5.72
#>
#> Confidence level used: 0.95 Is it correct that the "optimal" way to do it in marginaleffects is the first option below: marginaleffects::predictions(model,
newdata=marginaleffects::datagrid(grid_type = "balanced"),
variables="Species",
by="Species")
#>
#> Species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> setosa 1.43 0.0766 18.7 <0.001 256.7 1.28 1.58
#> versicolor 4.50 0.0742 60.6 <0.001 Inf 4.36 4.65
#> virginica 5.61 0.0563 99.6 <0.001 Inf 5.50 5.72
#>
#> Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
marginaleffects::predictions(model,
by="Species")
#>
#> Species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> setosa 1.46 0.0545 26.8 <0.001 524.4 1.36 1.57
#> versicolor 4.26 0.0545 78.2 <0.001 Inf 4.15 4.37
#> virginica 5.55 0.0545 101.9 <0.001 Inf 5.45 5.66
#>
#> Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response Created on 2024-02-11 with reprex v2.0.2 I would intuitively expect the second to give the same results... |
Oh I see, after reading your comment I think that basically marginaleffects requires to explicitly state what to marginalize over. If I understand, there are roughly 3 types of "means" you could get from the model: model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)
# Marginalized over range
marginaleffects::predictions(model,
newdata = insight::get_datagrid(model),
by = "Species")
#>
#> Species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> setosa 1.47 0.0548 26.7 <0.001 521.2 1.36 1.57
#> versicolor 4.17 0.0574 72.7 <0.001 Inf 4.06 4.29
#> virginica 5.52 0.0549 100.6 <0.001 Inf 5.42 5.63
#>
#> Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
# Marginalize over empirical distribution
marginaleffects::predictions(model,
newdata = insight::get_data(model),
by = "Species")
#>
#> Species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> setosa 1.46 0.0545 26.8 <0.001 524.4 1.36 1.57
#> versicolor 4.26 0.0545 78.2 <0.001 Inf 4.15 4.37
#> virginica 5.55 0.0545 101.9 <0.001 Inf 5.45 5.66
#>
#> Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
# Predictions at fixed mean
marginaleffects::predictions(model,
newdata = insight::get_datagrid(model, at="Species"))
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Species Sepal.Width
#> 1.43 0.0766 18.7 <0.001 256.7 1.28 1.58 setosa 3.06
#> 4.50 0.0742 60.6 <0.001 Inf 4.36 4.65 versicolor 3.06
#> 5.61 0.0563 99.6 <0.001 Inf 5.50 5.72 virginica 3.06
#>
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, Species, Sepal.Width, Petal.Length
#> Type: response Created on 2024-02-11 with reprex v2.0.2 Am I on the right track? Sorry for the dumb question I'm trying to wrap my head around the newest API |
Yes, that's exactly right! And if you're interested, here's a notebook where I show how to get equivalent results to |
Thanks for your help @vincentarelbundock, I think I managed to find a fix for now to have estimate_means() working (I pushed directly on the main so I'll close this but thanks!) |
I am preparing for release 1.0.0 of
marginaleffects
and deprecating functions that will not be part of the “final / stable” user interface.The
marginal_means
function now raises a deprecation warning and will be removed eventually.The recommended strategy is now to use:
predictions()
with thedatagrid()
function to create a balanced grid, and theby
argument to specify the variables for marginalization.IMHO, this is more explicit and in line with the rest of the
marginaleffects
function design: