`marginaleffects::marginal_means` deprecation #246

vincentarelbundock · 2024-02-05T12:30:48Z

I am preparing for release 1.0.0 of marginaleffects and deprecating functions that will not be part of the “final / stable” user interface.

The marginal_means function now raises a deprecation warning and will be removed eventually.

The recommended strategy is now to use: predictions() with the datagrid() function to create a balanced grid, and the by argument to specify the variables for marginalization.

IMHO, this is more explicit and in line with the rest of the marginaleffects function design:

library(emmeans)
library(marginaleffects)

pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)

emm.src <- emmeans(pigs.lm, specs = "source")
summary(emm.src, infer = TRUE, null = log(35))
#  source emmean     SE df lower.CL upper.CL null t.ratio p.value
#  fish     3.39 0.0367 23     3.32     3.47 3.56  -4.385  0.0002
#  soy      3.67 0.0374 23     3.59     3.74 3.56   2.988  0.0066
#  skim     3.80 0.0394 23     3.72     3.88 3.56   6.130  <.0001
# 
# Results are averaged over the levels of: percent 
# Results are given on the log (not the response) scale. 
# Confidence level used: 0.95


predictions(pigs.lm,                          # predictions using the `pigs.lm` model
  newdata = datagrid(grid_type = "balanced"), # on a balanced grid
  by = "source",                              # averaged across the `source` variable
  hypothesis = log(35),                       # null hypothesis
  df = 23)
# 
#  source Estimate Std. Error     t Pr(>|t|)    S 2.5 % 97.5 % Df
#    fish     3.39     0.0367 -4.39  < 0.001 12.2  3.32   3.47 23
#    soy      3.67     0.0374  2.99  0.00657  7.3  3.59   3.74 23
#    skim     3.80     0.0394  6.13  < 0.001 18.4  3.72   3.88 23
# 
# Columns: source, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, df 
# Type:  response

codecov · 2024-02-05T12:38:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9e2bb40) 35.10% compared to head (3e666dc) 34.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   35.10%   34.00%   -1.11%     
==========================================
  Files          25       24       -1     
  Lines        1168     1147      -21     
==========================================
- Hits          410      390      -20     
+ Misses        758      757       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DominiqueMakowski · 2024-02-05T13:26:03Z

Thanks Vincent, I've been finally experimenting a bit with the marginaleffects API, so I think I'm in a better position now to push do the overhaul that modelbased needs to depend on the latest marginaleffects

vincentarelbundock · 2024-02-05T13:27:21Z

Sounds fantastic!

Don't hesitate to ping me if you need input, review, or code snippets.

DominiqueMakowski · 2024-02-11T12:57:15Z

Hi @vincentarelbundock I'm giving this a shot,

I'd like to make marginaleffects::predictions() work with insight::get_datagrid(), could you let me know if it's possible to, in the following example, somehow specify to predictions() that I want to marginalize over the continuous predictor Sepal.Width. I thought that providing a newdata grid with "missing" values would work, but no, so tried adding the by argument so specify, but it doesn't seem to do what I have in mind.

model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)
grid <- insight::get_datagrid(model, at="Species")
grid$Sepal.Width <- NULL
grid
#>      Species
#> 1     setosa
#> 2 versicolor
#> 3  virginica

marginaleffects::predictions(model,
                                        newdata=grid,
                                        by="Species")

^{Created on 2024-02-11 with reprex v2.0.2}

tldr; what is the best way to obtain marginal means at species levels marginalized over Sepal.width?

vincentarelbundock · 2024-02-11T14:48:23Z

What do you mean specifically by "marginalized over Sepal.Width"?

vincentarelbundock · 2024-02-11T14:55:45Z

predictions() always requires a “full” grid, with all the variables used in the model. You should really think about it as enhanced version of the Base R function predict(). That’s why I used the same argument name newdata.

In addition, you get the by argument which specifies the categorical variables with respect to which we are averaging.

For example, you can marginalize over a balanced grid like this:

library(marginaleffects)

model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)

grid <- expand.grid(
  Sepal.Width = unique(iris$Sepal.Width),
  Species = unique(iris$Species)
)

predictions(model,
  newdata = grid,
  by = "Species")
# 
#     Species Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#  setosa         1.44     0.0638 22.6   <0.001 374.6  1.32   1.57
#  versicolor     4.62     0.0930 49.7   <0.001   Inf  4.44   4.80
#  virginica      5.71     0.0668 85.5   <0.001   Inf  5.58   5.84
# 
# Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
# Type:  response

Or you can marginalized over the empirical distribution of the actual data like the following (this is equivalent to the “cells” weights in emmeans):

predictions(model,
  newdata = iris,
  by = "Species")
# 
#     Species Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
#  setosa         1.46     0.0545  26.8   <0.001 524.4  1.36   1.57
#  versicolor     4.26     0.0545  78.2   <0.001   Inf  4.15   4.37
#  virginica      5.55     0.0545 101.9   <0.001   Inf  5.45   5.66
# 
# Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
# Type:  response

DominiqueMakowski · 2024-02-11T15:07:47Z

Say I want to reproduce the emmeans results:

model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)
emmeans::emmeans(model, "Species")
#> NOTE: Results may be misleading due to involvement in interactions
#>  Species    emmean     SE  df lower.CL upper.CL
#>  setosa       1.43 0.0766 144     1.28     1.58
#>  versicolor   4.50 0.0742 144     4.35     4.65
#>  virginica    5.61 0.0563 144     5.50     5.72
#> 
#> Confidence level used: 0.95

Is it correct that the "optimal" way to do it in marginaleffects is the first option below:

marginaleffects::predictions(model,
                             newdata=marginaleffects::datagrid(grid_type = "balanced"),
                             variables="Species",
                             by="Species")
#> 
#>     Species Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#>  setosa         1.43     0.0766 18.7   <0.001 256.7  1.28   1.58
#>  versicolor     4.50     0.0742 60.6   <0.001   Inf  4.36   4.65
#>  virginica      5.61     0.0563 99.6   <0.001   Inf  5.50   5.72
#> 
#> Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
#> Type:  response

marginaleffects::predictions(model,
                             by="Species")
#> 
#>     Species Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
#>  setosa         1.46     0.0545  26.8   <0.001 524.4  1.36   1.57
#>  versicolor     4.26     0.0545  78.2   <0.001   Inf  4.15   4.37
#>  virginica      5.55     0.0545 101.9   <0.001   Inf  5.45   5.66
#> 
#> Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
#> Type:  response

^{Created on 2024-02-11 with reprex v2.0.2}

I would intuitively expect the second to give the same results...

DominiqueMakowski · 2024-02-11T15:13:09Z

Oh I see, after reading your comment I think that basically marginaleffects requires to explicitly state what to marginalize over.

If I understand, there are roughly 3 types of "means" you could get from the model:

model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)


# Marginalized over range
marginaleffects::predictions(model,
                             newdata = insight::get_datagrid(model),
                             by = "Species")
#> 
#>     Species Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
#>  setosa         1.47     0.0548  26.7   <0.001 521.2  1.36   1.57
#>  versicolor     4.17     0.0574  72.7   <0.001   Inf  4.06   4.29
#>  virginica      5.52     0.0549 100.6   <0.001   Inf  5.42   5.63
#> 
#> Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
#> Type:  response

# Marginalize over empirical distribution
marginaleffects::predictions(model,
                             newdata = insight::get_data(model),
                             by = "Species")
#> 
#>     Species Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
#>  setosa         1.46     0.0545  26.8   <0.001 524.4  1.36   1.57
#>  versicolor     4.26     0.0545  78.2   <0.001   Inf  4.15   4.37
#>  virginica      5.55     0.0545 101.9   <0.001   Inf  5.45   5.66
#> 
#> Columns: Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
#> Type:  response

# Predictions at fixed mean
marginaleffects::predictions(model,
                             newdata = insight::get_datagrid(model, at="Species"))
#> 
#>  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %    Species Sepal.Width
#>      1.43     0.0766 18.7   <0.001 256.7  1.28   1.58 setosa            3.06
#>      4.50     0.0742 60.6   <0.001   Inf  4.36   4.65 versicolor        3.06
#>      5.61     0.0563 99.6   <0.001   Inf  5.50   5.72 virginica         3.06
#> 
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, Species, Sepal.Width, Petal.Length 
#> Type:  response

^{Created on 2024-02-11 with reprex v2.0.2}

Am I on the right track? Sorry for the dumb question I'm trying to wrap my head around the newest API

vincentarelbundock · 2024-02-11T15:18:02Z

Yes, that's exactly right!

And if you're interested, here's a notebook where I show how to get equivalent results to emmeans with "re-gridding" and transformations:

https://arelbundock.com/emmeans_and_marginaleffects.html

DominiqueMakowski · 2024-02-11T18:17:32Z

Thanks for your help @vincentarelbundock, I think I managed to find a fix for now to have estimate_means() working (I pushed directly on the main so I'll close this but thanks!)

marginaleffects::marginal_means deprecation

3e666dc

strengejacke mentioned this pull request Feb 6, 2024

CRAN submission (deadline 20/02) #247

Closed

IndrajeetPatil requested a review from DominiqueMakowski February 7, 2024 07:51

DominiqueMakowski closed this Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`marginaleffects::marginal_means` deprecation #246

`marginaleffects::marginal_means` deprecation #246

vincentarelbundock commented Feb 5, 2024

codecov bot commented Feb 5, 2024

DominiqueMakowski commented Feb 5, 2024

vincentarelbundock commented Feb 5, 2024

DominiqueMakowski commented Feb 11, 2024 •

edited

Loading

vincentarelbundock commented Feb 11, 2024

vincentarelbundock commented Feb 11, 2024 •

edited

Loading

DominiqueMakowski commented Feb 11, 2024

DominiqueMakowski commented Feb 11, 2024

vincentarelbundock commented Feb 11, 2024

DominiqueMakowski commented Feb 11, 2024

marginaleffects::marginal_means deprecation #246

marginaleffects::marginal_means deprecation #246

Conversation

vincentarelbundock commented Feb 5, 2024

codecov bot commented Feb 5, 2024

Codecov Report

DominiqueMakowski commented Feb 5, 2024

vincentarelbundock commented Feb 5, 2024

DominiqueMakowski commented Feb 11, 2024 • edited Loading

vincentarelbundock commented Feb 11, 2024

vincentarelbundock commented Feb 11, 2024 • edited Loading

DominiqueMakowski commented Feb 11, 2024

DominiqueMakowski commented Feb 11, 2024

vincentarelbundock commented Feb 11, 2024

DominiqueMakowski commented Feb 11, 2024

`marginaleffects::marginal_means` deprecation #246

`marginaleffects::marginal_means` deprecation #246

DominiqueMakowski commented Feb 11, 2024 •

edited

Loading

vincentarelbundock commented Feb 11, 2024 •

edited

Loading