is there a gllvm way to do the equivalent of mvabund::anova.manyglm()? #109

dougwyu · 2023-05-24T12:04:31Z

dougwyu
May 24, 2023

Hi again,

I'm curious if there is a gllvm way to test for a covariate effect on difference in composition, equivalent to mvabund::anova.manyglm() and producing a p-value? I've been trying to use mvabund for this, but there are some limitations in study design.

thanks,
doug

Answered by BertvanderVeen

May 24, 2023

Not equivalent, no. There is gllvm::anova, but it is not very helpful to compare models where the number of parameters is drastically different. Mvabund relies on re-sampling, and that is just something that is not really doable with gllvm.

Primarily for computational reasons, but also due to the sensitivity to starting values of the algorithm. You might want to have a look at summary(model) as an alternative, which produces wald-statistics per species and predictor with accompanying p-values.

View full answer

BertvanderVeen · 2023-05-24T13:15:44Z

BertvanderVeen
May 24, 2023
Collaborator

Not equivalent, no. There is gllvm::anova, but it is not very helpful to compare models where the number of parameters is drastically different. Mvabund relies on re-sampling, and that is just something that is not really doable with gllvm.

Primarily for computational reasons, but also due to the sensitivity to starting values of the algorithm. You might want to have a look at summary(model) as an alternative, which produces wald-statistics per species and predictor with accompanying p-values.

5 replies

gerverska Jun 8, 2024

Out of curiosity, is a mvabund::anova()-style function on the horizon with some the recent parallelization prep that seems to be going on?

BertvanderVeen Jun 8, 2024
Collaborator

I don't have anything in the works. A better method for hypothesis testing is often on my mind, but refitting of gllvms is IMO not feasible and way too prone to convergence issues.

gerverska Jun 8, 2024

I guess it does seem that providing a dummy trait matrix (as suggested by @tanharri, down below) is the easiest way to get some kind of "overall"/main effect test across taxa, but this is still at the coefficient level. This seems fine, but I know I'll probably encounter reviewer opposition on this point (most fungal microbiome work has historically been routine, dogmatic, non-parametric, distance-based stuff that might still provide overall p-values and R2).

BertvanderVeen Jun 9, 2024
Collaborator

Also have a look at https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0181790. @dwarton implemented the method in mvabund I think, so stare long enough at the implementation there and you might be able to work something out for use in gllvm. I just have not had the opportunity to look at it in great detail yet, despite it being published in 2017.

gerverska Jun 10, 2024

Thanks Bert! My stats and coding chops aren't anywhere near yours, but I might be able to take a stab at this with brute force (likely with yours, Jenni's, and the community's guidance).

Thanks to both you and @JenniNiku for all the work y'all have done on gllvm. I've been recommending this package (in addition to mvabund, boral, and ecoCopula) to everyone I know who wants to escape non-parametric/distance-based/under-powered hell and I will continue to do so!

dougwyu · 2023-05-30T10:10:39Z

dougwyu
May 30, 2023
Author

thanks for that. Do you mean something like the following? (based on your example)

data(antTraits)
y <- as.matrix(antTraits$abund)
X <- scale(as.matrix(antTraits$env))
TR <- antTraits$traits

fit_1 <- gllvm(y, X, TR, family = "negative.binomial", 
               formula = y ~ Bare.ground)
#with trait argument defined

fit_2 <- gllvm(y, X, family = "negative.binomial", 
               formula = y ~ Bare.ground)
#without trait argument defined

summary(fit_1)
summary(fit_2)

and by not including an interaction term in the formula, I get a main effect p-value. How should I think about this?

> summary(fit_1)

Call:
gllvm(y = y, X = X, TR = TR, formula = y ~ Bare.ground, family = "negative.binomial")

Family:  negative.binomial 

AIC:  4048.171 AICc:  4098.988 BIC:  4886.993 LL:  -1860.1 df:  164 

Informed LVs:  0 
Constrained LVs:  0 
Unconstrained LVs:  2 

Formula:  ~Bare.ground 
LV formula:  ~ 0 

Coefficients predictors:
            Estimate Std. Error z value Pr(>|z|)    
Bare.ground   0.1840     0.0559   3.291    0.001 ***

0 replies

BertvanderVeen · 2023-05-30T12:21:41Z

BertvanderVeen
May 30, 2023
Collaborator

General information on the wald test can be found here: https://en.wikipedia.org/wiki/Wald_test. But in short; you can use the p-value to assess if there is evidence for a certain effect (in example that is the amount of bare soil) on the response variable.

0 replies

tanharri · 2023-05-30T13:10:57Z

tanharri
May 30, 2023

Hi @BertvanderVeen. In the above example, why does defining the trait matrix TR result in a different model when the formula specified is the same?

0 replies

BertvanderVeen · 2023-05-30T13:29:31Z

BertvanderVeen
May 30, 2023
Collaborator

Trait models in gllvm work quite differently from models without traits. Even when the traits are not specified in the model formula, by specifying the trait matrix in the model the "trait-route" is taken. The trait model with species-specific predictor effects is IIRC unidentifiable, so when taking the "trait-route" the effect needs to be over the whole community, while without traits it it can be species-specific.

0 replies

tanharri · 2023-05-30T13:35:31Z

tanharri
May 30, 2023

I can populate the field with a dummy matrix to get this, but is there a syntax to retrieve the whole community effect without defining a TR trait matrix?

0 replies

BertvanderVeen · 2023-05-30T13:55:05Z

BertvanderVeen
May 30, 2023
Collaborator

Not that I know of, but perhaps @JenniNiku.

0 replies

BertvanderVeen · 2023-06-01T07:04:08Z

BertvanderVeen
Jun 1, 2023
Collaborator

@hjftan-nm I had a quick think about this. Here is an example to do what you ask:

> data(spider)
> model <- gllvm(spider$abund, spider$x, spider$trait ,formula = y ~ as.factor(species):bare.sand,  family='poisson')

0 replies

dwarton · 2024-06-11T01:33:23Z

dwarton
Jun 11, 2024

we've done some simulation work looking at this and despite a bunch of potential issues (theoretically and computationally) the anova function on gllvm tends to do OK, if you don't have too many response variables (less than 100, say). Well, we didn't actually test gllvm, we were using glmmTMB and the rr cov structure, but this should behave similarly. As Bert mentioned you should worry about convergence and whether the model fits you are using are good ones, log-likelihood can jump around a little sometimes. anova.gllvm requires at least two models to be specified, so you have to worry about this for each of your models, so for each it would be a good idea to do multiple runs and keep the ones with biggest logL (or better still, use a decent null model fit to provide starting values for the alternative, although this can require some thought to get right). And as the warnings in the output say these are approximate

3 replies

BertvanderVeen Jun 11, 2024
Collaborator

Thanks for pitching in David.

gerverska Jun 20, 2024

Thanks @dwarton --I'm intrigued by the mention of using a decent null model fit to provide starting values for the alternative. What sort of considerations are you referring to that require some thought to get right?

dwarton Jun 20, 2024

I'm suggesting that for hypothesis testing you first fit the null model a few times to make sure you have a good fit (large logL), and use that to provide starting values for your fit under the alternative model. If you run the null model a few times and get similar logL's then you can be reasonably confident in your fit, if it jumps around a fair bit then you would need to run it more time (and for a broader range of starting values) and keep the best one.

gerverska · 2024-06-21T21:11:57Z

gerverska
Jun 21, 2024

Ah, gotcha--I was wondering if there were any peculiarities regarding control.start arguments, etc. Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is there a gllvm way to do the equivalent of mvabund::anova.manyglm()? #109

{{title}}

Replies: 10 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

is there a gllvm way to do the equivalent of mvabund::anova.manyglm()? #109

Replies: 10 comments · 8 replies

BertvanderVeen May 24, 2023 Collaborator

BertvanderVeen Jun 8, 2024 Collaborator

BertvanderVeen Jun 9, 2024 Collaborator

dougwyu May 30, 2023 Author

BertvanderVeen May 30, 2023 Collaborator

BertvanderVeen May 30, 2023 Collaborator

BertvanderVeen May 30, 2023 Collaborator

BertvanderVeen Jun 1, 2023 Collaborator

BertvanderVeen Jun 11, 2024 Collaborator

Replies: 10 comments 8 replies

BertvanderVeen
May 24, 2023
Collaborator

BertvanderVeen Jun 8, 2024
Collaborator

BertvanderVeen Jun 9, 2024
Collaborator

dougwyu
May 30, 2023
Author

BertvanderVeen
May 30, 2023
Collaborator

BertvanderVeen
May 30, 2023
Collaborator

BertvanderVeen
May 30, 2023
Collaborator

BertvanderVeen
Jun 1, 2023
Collaborator

BertvanderVeen Jun 11, 2024
Collaborator