Output of hierarchical model unnecessarily contains fit for every single trial of parameters with regression formula. #497

SaschaFroelich · 2024-07-18T13:26:45Z

When fitting a hierarchical model like the one from the tutorial:

model_reg_v_angle_hier = hssm.HSSM(
    data=dataset_reg_v_hier,
    model="angle",
    include=[
        {
            "name": "v",
            "prior": {
                "Intercept": {
                    "name": "Uniform",
                    "lower": -3.0,
                    "upper": 3.0,
                    "initval": 0.0,
                },
                "x": {"name": "Uniform", "lower": -1.0, "upper": 1.0, "initval": 0.0},
                "y": {"name": "Uniform", "lower": -1.0, "upper": 1.0, "initval": 0.0},
            },
            "formula": "v ~ 1 + (1|subject) + x + y",
            "link": "identity",
        }
    ],
)

jax.config.update("jax_enable_x64", False)
out = model_reg_v_angle_hier.sample(
    sampler="nuts_numpyro", chains=2, cores=1, draws=30, tune=30
)

out will contain one row for the fitted v of every trial of every subject. This quickly grows to enormous sizes when fitting the model to actual experimental data. For instance, I have a total of 33k trials across 60 participants, and would like to formulate a regression formula for v and for z. The output is several GB large, and simply printing the summary of the inference object to check whether the chains converged takes forever.

It does this also when I change the regression formula to something like v ~ 1 + (1|subject), so that v does not change from trial to trial for each individual subject.

Is there a way to suppress the outputting of the fitting results for each individual trial, and only get the output at group-level, or at subject-level?

Furthermore, in the tutorial, az.plot_forest(model_reg_v_angle_hier.traces) plots the values for v for the individual subjects. But when I copy the tutorial code as is, it tries to plot one row for v per trial, across all subjects (>3k trials).

I use hssm version 0.2.1 on Ubuntu 22.04. with python 3.10.12.

The text was updated successfully, but these errors were encountered:

digicosmos86 · 2024-07-22T19:39:42Z

Hi,

Can you try specifying include_mean=False when you are calling .sample()? This will not include the means in the InferenceData object. If the arviz plots are not clean enough, you can also specify var_names when you are calling az.plot_forest to filter those variables out.

SaschaFroelich · 2024-07-23T11:08:04Z

Hi,

Can you try specifying include_mean=False when you are calling .sample()? This will not include the means in the InferenceData object. If the arviz plots are not clean enough, you can also specify var_names when you are calling az.plot_forest to filter those variables out.

Hi,

thanks for your reply! include_mean=False does not make any difference, unfortunately. What is it supposed to do? Yes, I can use var_names for the forest plots, but before I do that I would like to check whether my chains converged, which I do by checking the r_hats with az.summary(out) (unless there's a better way). That takes extremely long (~12 minutes on my machine with 16 cores and 32GB memory). I simply don't think it's necessary to store the result for every single trial (amounting to an InferenceObject of 2GB in my case). Especially when the results are the same for most trials (I differentiate between 3 different trial types). Wouldn't it make sense to include a keyword argument that disables storage of every individual trial fit result?

digicosmos86 · 2024-07-23T16:36:22Z

Yes we are working on that. Can you see if you can add a var_names argument to model.sample() with only the variables that you want to include? According to PyMC documentation, you can override the default behavior which is to include all free and deterministics.

frankmj · 2024-07-23T16:44:48Z

Let me just add that you can also add var_names to az.summary to avoid outputting the deterministic trial level parameters, and that will speed it up massively. You can also use var_names to exclude parameters (e.g. ~a )

…

On Tue, Jul 23, 2024 at 12:36 PM Paul Xu ***@***.***> wrote: Yes we are working on that. Can you see if you can add a var_names argument to model.sample() with only the variables that you want to include? According to PyMC documentation <https://www.pymc.io/projects/docs/en/latest/api/generated/pymc.sample.html>, you can override the default behavior which is to include all free and deterministics. — Reply to this email directly, view it on GitHub <#497 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG7TFGCJU36KMRB5KJC35TZN2BBXAVCNFSM6AAAAABLCV4JQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBVG4YTSNJSGA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

digicosmos86 · 2024-07-24T18:23:09Z

It seems that there's no way to exclude variables in the InferenceData though, which can blow up when the model is complex. I have submitted an issue to bambi. bambinos/bambi#828

SaschaFroelich changed the title ~~Output of hierarchical model is ginormous~~ Output of hierarchical model contains fit for every single trial of parameter with regression formula. Jul 18, 2024

SaschaFroelich changed the title ~~Output of hierarchical model contains fit for every single trial of parameter with regression formula.~~ Output of hierarchical model contains fit for every single trial of parameters with regression formula. Jul 18, 2024

SaschaFroelich changed the title ~~Output of hierarchical model contains fit for every single trial of parameters with regression formula.~~ Output of hierarchical model unnecessarily contains fit for every single trial of parameters with regression formula. Jul 19, 2024

SaschaFroelich closed this as completed Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output of hierarchical model unnecessarily contains fit for every single trial of parameters with regression formula. #497

Output of hierarchical model unnecessarily contains fit for every single trial of parameters with regression formula. #497

SaschaFroelich commented Jul 18, 2024 •

edited

Loading

digicosmos86 commented Jul 22, 2024

SaschaFroelich commented Jul 23, 2024 •

edited

Loading

digicosmos86 commented Jul 23, 2024

frankmj commented Jul 23, 2024 via email

digicosmos86 commented Jul 24, 2024

Output of hierarchical model unnecessarily contains fit for every single trial of parameters with regression formula. #497

Output of hierarchical model unnecessarily contains fit for every single trial of parameters with regression formula. #497

Comments

SaschaFroelich commented Jul 18, 2024 • edited Loading

digicosmos86 commented Jul 22, 2024

SaschaFroelich commented Jul 23, 2024 • edited Loading

digicosmos86 commented Jul 23, 2024

frankmj commented Jul 23, 2024 via email

digicosmos86 commented Jul 24, 2024

SaschaFroelich commented Jul 18, 2024 •

edited

Loading

SaschaFroelich commented Jul 23, 2024 •

edited

Loading