Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow support for Multivariate Normal / T, Tweedie, Multinomial and Dirichlet in Stan #52

Open
nicholasjclark opened this issue May 23, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@nicholasjclark
Copy link
Owner

nicholasjclark commented May 23, 2024

Tweedie shouldn't be too much work, just involves:
https://discourse.mc-stan.org/t/tweedie-likelihood-compound-poisson-gamma-in-stan/14636/9;

Dirichlet requires getting estimate of $\mu$ for each series at each timepoint, so would have to enforce that there cannot be any partially observed timepoints in data. Would also require a reference category be chosen for identifiability, and that some sort of ordering is maintained to ensure that relevant rows of the design matrix are properly attributed to the correct series. A potentially useful way would be to use cbind(y1, y2, y3, et.....) as in brms:

df <- data.frame(y1 = runif(1000))
df$y2 <- runif(1000, 0, 1 - df$y1)
df$y3 <- 1 - df$y1 - df$y2
df$g <- c("a", "b", "c", "d")
make_stancode(
   cbind(y1,y2,y3) ~ g, 
   data = df, 
   family = dirichlet(refcat = "y3"))

A key restriction would have to be that the same linear predictors must be used for each response / category (in contrast to what can be done in brms. Could then ensure that the design matrix is appropriately ordered so that the vector of linear predictors is supplied to the dirichlet distribution, as in brms:

 // initialize linear predictor term
  vector[N] muy1 = rep_vector(0.0, N);
  // initialize linear predictor term
  vector[N] muy2 = rep_vector(0.0, N);
  // linear predictor matrix
  array[N] vector[ncat] mu;
  muy1 += Intercept_muy1 + Xc_muy1 * b_muy1;
  muy2 += Intercept_muy2 + Xc_muy2 * b_muy2;
  for (n in 1:N) {
     mu[n] = transpose([muy1[n], muy2[n], 0]);
   }
  for (n in 1:N) {
    target += dirichlet_logit_lpdf(Y[n] | mu[n], phi);
  }

This same construct of using cbind for the outcomes should also make it straightforward to implement multinomial, multivariate Gaussian and multivariate T observation models.

Will just have to work out how to ensure outcome-scale posterior predictions can be done efficiently, as these won't be able to be vectorized (or can they?; a multivariate normal can be calculated as y = mu + L_Sigma * y_std;, where L_Sigma is the Cholesky decomposition of Sigma and y_std is Standard Normal. A multivariate T just replaces y_std with draws from a Standard T with appropriate degrees of freedom).

This could be an extremely useful model for many types of multivariate and compositional time series. A very useful model would then be a VAR for the means, which are on the logit scale.
https://discourse.mc-stan.org/t/understanding-the-parameters-of-a-hierarchical-dirichlet-regression/27342/2

When adding these, must ensure multithreading is also supported

@nicholasjclark nicholasjclark added the enhancement New feature or request label Jun 13, 2024
@nicholasjclark nicholasjclark changed the title Allow support for Tweedie in Stan Allow support for Tweedie and Dirichlet in Stan Oct 10, 2024
@nicholasjclark nicholasjclark changed the title Allow support for Tweedie and Dirichlet in Stan Allow support for Multivariate Normal / T, Tweedie and Dirichlet in Stan Oct 31, 2024
@nicholasjclark nicholasjclark changed the title Allow support for Multivariate Normal / T, Tweedie and Dirichlet in Stan Allow support for Multivariate Normal / T, Tweedie, Multinomial and Dirichlet in Stan Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant