Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error for R2 for beta binomial glmmTMB model #787

Closed
sschooler opened this issue Dec 19, 2024 · 9 comments · Fixed by #788
Closed

Error for R2 for beta binomial glmmTMB model #787

sschooler opened this issue Dec 19, 2024 · 9 comments · Fixed by #788
Assignees
Labels
Bug 🐛 Something isn't working

Comments

@sschooler
Copy link

sschooler commented Dec 19, 2024

Hello, I am attempting to get an R2 value for a proportional beta binomial model created using glmmTMB. When I run r2(model), I get the error "Can't calculate accurate R2 for binomial models that are not Bernoulli models."

The model that I'm working with is an n/k proportional model, so I tried that first, using the notation cbind(successes, failures), but then also tried a proportional model, using the response variable as successes/total. Both get the error.

Here's a reprex. The model doesn't converge (because it's nonsense) but that shouldn't matter for the function I don't think. Using my actual data, the model does converge and I still get the same error.

require(tidyr)
require(glmmTMB)
require(performance)
# data
data("AirPassengers"); AirPassengers

# need to do some work on the data b/c it's a timeseries
AirPassengers <- matrix(AirPassengers, ncol = 12, nrow = 12, byrow = T)
AP.df <- data.frame(AirPassengers, "Year" = c(1949:1960))
colnames(AP.df) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                 "Jun", "Jul", "Aug", "Sep", "Oct",
                 "Nov", "Dec", "Year")
AP.df$Total <- rowSums(AP.df[,-13])

long.df <- AP.df %>% pivot_longer(Jan:Dec, names_to = "Month", values_to = "Passengers")
# failures in this model are passengers that traveled during the year but not this month
long.df$NonPassengers <- long.df$Total-long.df$Passengers
long.df$proportion <- long.df$Passengers/long.df$Total

# specify the models
nk.betabinom.model <- glmmTMB(data = long.df, cbind(Passengers, NonPassengers) ~ Year + Month,
                           betabinomial(link = "logit"))

betabinom.model <- glmmTMB(data = long.df, proportion ~ Year + Month,
                           betabinomial(link = "logit"))


# get error
r2(nk.betabinom.model)

# also error
r2(betabinom.model)
@bbolker
Copy link

bbolker commented Dec 19, 2024

Just to clarify: this is a warning, not an error (and a NULL) result (these are technically different things, although I can appreciate that you're not getting the answer you want). It's not clear how much theoretical work would be required to get this answer ... for a non-mixed model, in principle you could compute one of the flavours of pseudo R^2 value,

I think this should work to get you a variety of pseudo-R^2 values if your full model actually converges (taken from the guts of pscl:::pR2.default). If you want to do this for a model with random effects then it's a lot more trouble ...

object <- nk.betabinom.model
 llh <- logLik(object)
objectNull <- update(object, .~1, data = long.df)
llhNull <- logLik(objectNull)
 pscl:::pR2Work(llh, llhNull)

@strengejacke
Copy link
Member

It seems like pR2Work() returns the McFadden R2, so you could check if performance::r2_mcfadden() works for your model (and if so, we could internally calculate McFadden R2 when r2() is called for a beta-binomial model).

@strengejacke
Copy link
Member

Just for the records, for BBreg we default to Cox & Snell R2.

@strengejacke
Copy link
Member

All the approaches return no vali R2 value.

require(tidyr)
#> Loading required package: tidyr
require(glmmTMB)
#> Loading required package: glmmTMB
require(performance)
#> Loading required package: performance
# data
data("AirPassengers")

# need to do some work on the data b/c it's a timeseries
AirPassengers <- matrix(AirPassengers, ncol = 12, nrow = 12, byrow = T)
AP.df <- data.frame(AirPassengers, "Year" = c(1949:1960))
colnames(AP.df) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                 "Jun", "Jul", "Aug", "Sep", "Oct",
                 "Nov", "Dec", "Year")
AP.df$Total <- rowSums(AP.df[,-13])

long.df <- AP.df %>% pivot_longer(Jan:Dec, names_to = "Month", values_to = "Passengers")
# failures in this model are passengers that traveled during the year but not this month
long.df$NonPassengers <- long.df$Total-long.df$Passengers
long.df$proportion <- long.df$Passengers/long.df$Total

# specify the models
nk.betabinom.model <- glmmTMB(data = long.df, cbind(Passengers, NonPassengers) ~ Year + Month,
                           betabinomial(link = "logit"))
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; false convergence (8). See vignette('troubleshooting'),
#> help('diagnose')

betabinom.model <- glmmTMB(data = long.df, proportion ~ Year + Month,
                           betabinomial(link = "logit"))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')


r2(nk.betabinom.model)
#> # R2 for Generalized Linear Regression
#>        R2: NA
#>   adj. R2: NA

r2(betabinom.model)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; false convergence (8). See vignette('troubleshooting'),
#> help('diagnose')
#> # R2 for Generalized Linear Regression
#>        R2: NA
#>   adj. R2: NA

object <- nk.betabinom.model
llh <- logLik(object)
objectNull <- update(object, .~1, data = long.df)
llhNull <- logLik(objectNull)
pscl:::pR2Work(llh, llhNull)
#>       llh   llhNull        G2  McFadden      r2ML      r2CU 
#>        NA -715.1137        NA        NA        NA        NA

r2_mcfadden(betabinom.model)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> Warning in eval(family$initialize): Model convergence problem; false
#> convergence (8). See vignette('troubleshooting'), help('diagnose')
#> # R2 for Generalized Linear Regression
#>        R2: NA
#>   adj. R2: NA

r2_mcfadden(nk.betabinom.model)
#> # R2 for Generalized Linear Regression
#>        R2: NA
#>   adj. R2: NA

r2_coxsnell(betabinom.model)
#> Warning: deviance residuals not defined for family 'betabinomial': returning NA
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; false convergence (8). See vignette('troubleshooting'),
#> help('diagnose')
#> Warning: deviance residuals not defined for family 'betabinomial': returning NA
#> Cox & Snell's R2 
#>               NA

r2_coxsnell(nk.betabinom.model)
#> Warning: deviance residuals not defined for family 'betabinomial': returning NA
#> Warning: deviance residuals not defined for family 'betabinomial': returning NA
#> Cox & Snell's R2 
#>               NA

Created on 2024-12-20 with reprex v2.1.1

@strengejacke
Copy link
Member

I'm not sure, but I think it's because glmmTMB doesn't calculate a log-likelihood value @bbolker ?:

require(tidyr)
#> Loading required package: tidyr
require(glmmTMB)
#> Loading required package: glmmTMB
require(performance)
#> Loading required package: performance
# data
data("AirPassengers")

# need to do some work on the data b/c it's a timeseries
AirPassengers <- matrix(AirPassengers, ncol = 12, nrow = 12, byrow = T)
AP.df <- data.frame(AirPassengers, "Year" = c(1949:1960))
colnames(AP.df) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                 "Jun", "Jul", "Aug", "Sep", "Oct",
                 "Nov", "Dec", "Year")
AP.df$Total <- rowSums(AP.df[,-13])

long.df <- AP.df %>% pivot_longer(Jan:Dec, names_to = "Month", values_to = "Passengers")
# failures in this model are passengers that traveled during the year but not this month
long.df$NonPassengers <- long.df$Total-long.df$Passengers
long.df$proportion <- long.df$Passengers/long.df$Total

# specify the models
nk.betabinom.model <- glmmTMB(data = long.df, cbind(Passengers, NonPassengers) ~ Year + Month,
                           betabinomial(link = "logit"))
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; false convergence (8). See vignette('troubleshooting'),
#> help('diagnose')

betabinom.model <- glmmTMB(data = long.df, proportion ~ Year + Month,
                           betabinomial(link = "logit"))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

logLik(betabinom.model)
#> 'log Lik.' NA (df=14)
logLik(nk.betabinom.model)
#> 'log Lik.' NA (df=14)

Created on 2024-12-20 with reprex v2.1.1

@bwiernik
Copy link
Contributor

Maybe we just return the "generic" R2 value based on y and yhat?

@bbolker
Copy link

bbolker commented Dec 20, 2024

I think this example will work fine with a beta-binomial model that actually converges/returns a log-likelihood value (glmmTMB returns NA for the log-likelihood if the sampling covariance matrix is not positive-definite, as these are considered unreliable fits ...)

@bbolker
Copy link

bbolker commented Dec 20, 2024

library(glmmTMB)
set.seed(101)
dd <- data.frame(x = rnorm(200))
dd$y <- simulate_new( ~ 1 + x, newdata = dd, newparams = list(beta = c(0,1), betadisp = -1), 
                                        weights = rep(10, nrow(dd)), family = "betabinomial")[[1]]
m <- glmmTMB(y/10 ~ 1 + x, data = dd, weights = rep(10, nrow(dd)), family = betabinomial)
llh <- logLik(m)
objectNull <- update(m, .~1, data =dd)
llhNull <- logLik(objectNull)
pscl:::pR2Work(llh, llhNull)
##           llh       llhNull            G2      McFadden          r2ML 
## -305.89477098 -328.54016882   45.29079567    0.06892733    0.20264396 
##          r2CU 
##    0.21052290 

But there could still be problems with (1) methods that require deviance residuals (not yet defined for betabinomial -- presumably we could figure this out? (2) models where the response is specified as a two-column matrix (I think ...)

@strengejacke
Copy link
Member

ok, thanks, issue should be fixed in #788
what are r2ML and r2CU in the pRWork output?

@strengejacke strengejacke added the Bug 🐛 Something isn't working label Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants