Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add save_model and load_model functions (or something similar) #259

Open
tomicapretto opened this issue Oct 28, 2020 · 8 comments
Open

Add save_model and load_model functions (or something similar) #259

tomicapretto opened this issue Oct 28, 2020 · 8 comments

Comments

@tomicapretto
Copy link
Collaborator

PyMC3 allows to save traces via pm.save_trace() and they can be loaded via pm.load_trace().
I think that having functionality to save and load model objects will favor interactivity when working with Bambi models.
It is already happening to me that every time I reset my session I need to run the samplers again and it is annoying.

@tomicapretto
Copy link
Collaborator Author

@aloctavodia pointed that we can already save fitted models via arviz.to_netcdf and loaded with arviz.from_netcdf.

@tomicapretto
Copy link
Collaborator Author

I would like to add this feature soon, and I've been thinking that dill is a good candidate. However, a model has two "independent" objects associated with it: the Model instance itself and the InferenceData object that Model.fit() returns. I think this functionality would make more sense if it could be used like save_model(object, path) and load_model(path). I'm opening a new issue with some ideas about how we could achieve this.

@canyon289
Copy link
Collaborator

canyon289 commented Apr 9, 2021 via email

@tomicapretto
Copy link
Collaborator Author

Makes much sense! Much better than my proposal! I'll try to write something

@tomicapretto
Copy link
Collaborator Author

I'm realizing that we have more than a formula and the prior description. We also have a pandas DataFrame, the Bambi Term instances a formulae.DesignMatrices object. The design matrices and the terms could be re-constructed from the formula, the prior description, and the pandas DataFrame, but I maybe problems can arise? For example, what if you have a model built and saved with one version of Bambi, and then you load the description with another version of Bambi where something has changed. I don't know, just thinking out loud here.

@jankaWIS
Copy link

jankaWIS commented Sep 5, 2024

Hi, I don't have much to contribute to this topic but I do have this question. Is there a (recommended) way how to save and load models after fitting them? Let's say you want to run the model, save and then later come to play with visualisations or inspections and you don't want to rerun it again.
Thanks!

@ColCarroll
Copy link
Collaborator

Things have sort of changed for the better here -- I think you could save the inference results using .to_netcdf and .load, respectively on the inference data object.

A nice thing about xarray/netCDF/arviz (they're all kind of the same thing when it comes to InferenceData) is that they can carry JSON metadata. It might make sense for bambi to write some of that when running inference. @tomicapretto might have a better idea about how to do that, but it feels like it might be a fairly easy contribution (not trying to pressure you! 😁 )

@tomicapretto
Copy link
Collaborator Author

We're not offering anything at the moment unfortunately. But I think I can help you with some ideas.

When we talk about "saving and loading the model" we need to keep in mind there are two things we usually work with:

  • The Bambi model
  • The InferenceData object. This usually contains draws from the posterior, but it can contain other things (draws from prior, prior predictive, posterior predictive, log likelihood, etc.)

These two objects differ in some respects regarding saving/loading.

  • The Bambi model object "contains" a lot of objects and data of varying complexity, and it doesn't offer method to store and load it from disk. However, it's relatively cheap to instantiate a Bambi model multiple times.
  • The InferenceData object is much better structured, and it offers methods to save and load from disk (the ones Colin mentioned above). On the other hand, it's quite hard to generate all the time when it has draws from the posterior (because one needs to sample from the posterior and that takes time!).

You could write a small program that creates a Bambi model and then checks if a specific .nc file already exists to determine if the model has been previously fitted. If the file exists, it loads the .nc file and moves on; if not, it samples the posterior and saves the results in the .nc file and moves on. This way, the posterior is sampled only once.

Is that the ideal approach? I don't think so. If you need things that require interacting with the underlying PyMC model after getting posterior draws, the PyMC model will be recompiled each time. However, that is usually much cheaper than getting draws from the posterior all the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants