Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying priors for categorical variables in regression does not work #387

Open
eort opened this issue Apr 5, 2024 · 10 comments
Open
Assignees
Labels
bug Something isn't working upstream Related to upstream packages

Comments

@eort
Copy link

eort commented Apr 5, 2024

Describe the bug

When running a (hierarchical) regression model and I try to specify priors for categorical variables I receive following error:
TypeError: Wrong number of dimensions: expected 1, got 0 with shape (). (full stack trace below). I only tried this for the angle model

HSSM version

0.2

To Reproduce

import hssm
from pandas import read_csv

data = read_csv('cavanagh_theta_nn.csv')
model = hssm.HSSM(data=data,
                  model='angle',
                  loglik='angle.onnx',
                  loglik_kind='approx_differentiable',
                  include=[
                    {
                        "name": "v",
                        "formula":"v ~ (1 | participant_id) + conf",
                        "prior": {
                            "Intercept": {"name": "Uniform", "lower": -1, "upper": 1, "initval":0},
                            "conf": {"name": "Normal", "mu": 0, "sigma": 1, "initval":0}
                        }
                    }
                ],
                  hierarchical=True,
                  link_settings='log_logit'
)

Full stack trace:
Traceback (most recent call last):

  File "/gpfs/project/ort/lorasick/ddm/code/hssm_reg.py", line 50, in <module>
    model = hssm.HSSM(data=data,
            ^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/hssm/hssm.py", line 334, in __init__
    self.set_alias(self._aliases)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/hssm/hssm.py", line 594, in set_alias
    self.model.build()
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/models.py", line 350, in build
    self.backend.build(self)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/backend/pymc.py", line 70, in build
    self.components[name].build(self, spec)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/backend/model_components.py", line 60, in build
    self.build_common_terms(pymc_backend, bmb_model)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/backend/model_components.py", line 98, in build_common_terms
    coef, data = common_term.build(bmb_model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/backend/terms.py", line 58, in build
    coef = distribution(label, dims=dims, **args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/pymc/distributions/distribution.py", line 316, in __new__
    rv_out = model.register_rv(
             ^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/pymc/model/core.py", line 1294, in register_rv
    self.set_initval(rv_var, initval)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/pymc/model/core.py", line 1110, in set_initval
    initval = rv_var.type.filter(initval)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/pytensor/tensor/type.py", line 241, in filter
    raise TypeError(
TypeError: Wrong number of dimensions: expected 1, got 0 with shape ().
@digicosmos86
Copy link
Collaborator

Hi @eort,

What version of cavanagh_theta_nn.csv are you using, so that I can reproduce this error? Please note that the version that comes from HDDM is not compatible with HSSM, due to the coding of response and the naming of participant_id. You can use hssm.load_data("cavanagh_theta") to load a version that's compatible with HSSM.

Without actually running the code, it seems that the error is with setting initial values. Does the error still happen when initvals are removed from your model specification?

Thanks!
Paul

@eort
Copy link
Author

eort commented Apr 8, 2024

Hi @digicosmos86,

What version of cavanagh_theta_nn.csv are you using

The one in hssm/datasets. The problem originally occurred with my own data, so I tried to reproduce with the standard dataset so you can have a look yourself.

Does the error still happen when initvals are removed from your model specification?

No, it doesn't! After I left out initvals from the model specifications, the model compiles fine

@digicosmos86
Copy link
Collaborator

Hi @eort,

It seems that after removing the parenthesis around 1|participant_id, the model compiled fine for me. Perhaps parentheses has special meaning in bambi. Moving conf before (1|participant_id) also helped. So for me, the formula looks like this: v ~ conf + 1|participant_id.

@tomicapretto Could you share some insights?

Thanks!
Paul

@eort
Copy link
Author

eort commented Apr 12, 2024

Hi @digicosmos86,

Thanks for looking into this!

It seems that after removing the parenthesis around 1|participant_id, the model compiled fine for me

I'm not 100% sure, but when I try this (parentheses on/off), I get different models. No parentheses won't include random effects in the model. When I look into the inference data, there are no v_1 | participant_id nodes. And also when looking directly at the model it looks like this.

See here with parentheses (output shortened):

In [1]: model
Out[1]: 
(Hierarchical Sequential Sampling Model
 Model: angle
 v:
     Formula: v ~ coh + (1|participant_id)
     Priors:
         v_Intercept ~ Uniform(lower: -1.0, upper: 1.0, initval: 0.0)
         v_coh ~ Normal(mu: 0.0, sigma: 1.0, initval: 0.0)
         v_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.3))
     Link: Generalized logit link function with bounds (-3.0, 3.0)
     Explicit bounds: (-3.0, 3.0)

And here without:

Hierarchical Sequential Sampling Model
Model: angle

v:
    Formula: v ~ coh + 1|participant_id
    Priors:
        v_Intercept ~ Uniform(lower: -1.0, upper: 1.0, initval: 0.0)
        v_coh ~ Normal(mu: 0.0, sigma: 1.0, initval: 0.0)
    Link: Generalized logit link function with bounds (-3.0, 3.0)
    Explicit bounds: (-3.0, 3.0)

@digicosmos86
Copy link
Collaborator

Hi @eort,

Thanks for catching that! I think for now the best way to specify the formulas is parameter ~ fixed effects + (random effects). The parentheses do change how the models are built, and whether you have spaces around the | notation also changes things. We recommend that you keep the parentheses around the random effects. We have updated our documentation to highlight that.

Thanks for bring this to our attention!
Paul

@eort
Copy link
Author

eort commented Apr 12, 2024

Sounds good. But to be clear, like you said in your first post, one should also not specify the init_val, right? At least that is the one consistent thing that throws the error I reported initially.

@digicosmos86
Copy link
Collaborator

Sounds good. But to be clear, like you said in your first post, one should also not specify the init_val, right? At least that is the one consistent thing that throws the error I reported initially.

It is unclear to me right now how Bambi deals with initval, but you can definitely still specify initial values if that helps with sampling. I had the most luck with this specification v ~ C(conf) + (1|participant_id). C() (upper case C) is an operator that explicitly specifies that conf is a categorical variable. Then I proceeded with a prior with initval, and the model also compiled fine. I will need to check with the Bambi team to figure out why initval causes an error when C() is left out.

@tomicapretto
Copy link
Collaborator

tomicapretto commented Apr 16, 2024

@eort @digicosmos86 sorry for the delay. I think the problem is related to Bambi not handling correctly the shapes of the parameters of the prior and the initval. I'll run a quick example now.

Update I have to run now, but regarding the parenthesis discussion, you have to include the parenthesis for group-specific terms.

@digicosmos86
Copy link
Collaborator

Thank you @tomicapretto for looking into this!

@eort
Copy link
Author

eort commented Apr 17, 2024

but you can definitely still specify initial values if that helps with sampling.

Perhaps this is due to the way that I specify these initvals, but I can't specify it without seeing the error above. The problem might be, that I not just specify a categorical regressor, but also specify the order in which the levels should be considered (I have a clear baseline condition that should be treated as a baseline). So on my own dataset I do this:

    "include":[
                    {
                        "name": "a",
                        "formula":"a ~ C(drug, levels=lvl) + (1 | participant_id)",
                        "prior": {
                            "Intercept": {"name": "Normal", "mu": 0.5, "sigma": 0.5, "initval":0.5},
                            "C(drug, levels=lvl)": {"name": "Normal", "mu": 0, "sigma": 1, "initval":0.5}
                        }
                ]

For the cavanagh dataset that should be something like:

                    {
                        "name": "v",
                        "formula":"v ~ (1 | participant_id) +  C(conf, levels=lvl)",
                        "prior": {
                            "Intercept": {"name": "Uniform", "lower": -1, "upper": 1, "initval":0},
                            "conf": {"name": "Normal", "mu": 0, "sigma": 1, "initval":0}
                        }
                    }

where lvl then should be provided in the model via extra_namespace={"lvl": <levels of conf>}

@digicosmos86 digicosmos86 added bug Something isn't working upstream Related to upstream packages labels May 8, 2024
@digicosmos86 digicosmos86 self-assigned this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Related to upstream packages
Projects
None yet
Development

No branches or pull requests

3 participants