Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PYMC3 backend missing data imputation #362

Open
zwelitunyiswa opened this issue Jun 22, 2021 · 9 comments
Open

PYMC3 backend missing data imputation #362

zwelitunyiswa opened this issue Jun 22, 2021 · 9 comments

Comments

@zwelitunyiswa
Copy link

I know that PYMC3 will do null data imputation automatically on a masked null value. With Bambi, we have to get rid of rows that have null data if those null cells are included in the model. Is it possible to use PYMC3's data imputation by applying a mask to null values or some other method?

Thanks for your kind consideration.

@tomicapretto
Copy link
Collaborator

Hi! Thanks for opening this issue!

Unfortunately, this is not available right now and I think it wouldn't be very straightforward to implement it.

Maybe, you can take the PyMC model within the Bambi model, add/remove/modify random variables, and then sample using the PyMC model instead of Bambi. But I'm not sure if this can be done. Maybe @aloctavodia knows if this is possible or not.

@zwelitunyiswa
Copy link
Author

Interesting suggestion. I am not great at the PMYC syntax, which is why Bambi is so amazing because R-syntax is much easier to learn and understand. Does Bambi have a way to pull out the translation it makes to PYMC3 so that one could just use it to build on for running within PYMC3 for these pocket cases. Probably the answer is no, but thought I would ask anyways.

@tomicapretto
Copy link
Collaborator

For example

# setup
import bambi as bmb
import numpy as np
import pandas as pd
import pymc3 as pm

data = pd.DataFrame({
    "y" : np.random.normal(size=100),
    "x" : np.random.normal(size=100)
})

And if you do

model = bmb.Model("y ~ x", data)
model.build() # this is an intermediate step when you call model.fit()

you get a PyMC model in model.backend.model. This can be used as any other PyMC model to do things like

with model.backend.model:
    idata = pm.sample(return_inferencedata=True)

But yes, it requires one to be familiar with PyMC unfortunately. Let's wait for Osvaldo's input on this issue, he is much more familiar with PyMC3 than I am.

@zwelitunyiswa
Copy link
Author

zwelitunyiswa commented Jun 22, 2021 via email

@aloctavodia
Copy link
Collaborator

We may implement this without changing too much code inside Bambi if we record the missing observation prior to remove them, we then proceed as usual with all the bambi stuff and then after inference we automatically compute and return the posterior predictive distribution for the missing observations.

@zwelitunyiswa yes, that means you can to something like this. Notice also that in the upcoming versions of PyMC3 (V4 and later), PyMC3 will natively (and by default) will use samplers with similar speed ups, as those show in the example notebook. And eventually we will run Bambi on top of that.

@zwelitunyiswa
Copy link
Author

@aloctavodia That would be great. I got the JAX sampling to work, then Jax/Numpyro made a change and I did not get around to downgrading to get stuff to work again.

However, it's good news that PYMC3 4 will natively take care of it. That's amazing. I was getting 8-10x speedups with Jax on my MacBook. You guys on Bambi/Pymc3 are doing some amazing work. For business guys like myself utilizing Bayes was painful but these tools just make it so much more convenient and accessible.

@tomicapretto Thanks again.

@zwelitunyiswa
Copy link
Author

I am not sure if I should close this, or if there will be an attempt to implement @aloctavodia's solution. Let me know if you want me to close out this issue.

@aloctavodia
Copy link
Collaborator

@zwelitunyiswa we should leave this open. Thanks for opening this issue.

@skulshreshtha
Copy link

We may implement this without changing too much code inside Bambi if we record the missing observation prior to remove them, we then proceed as usual with all the bambi stuff and then after inference we automatically compute and return the posterior predictive distribution for the missing observations.

@zwelitunyiswa yes, that means you can to something like this. Notice also that in the upcoming versions of PyMC3 (V4 and later), PyMC3 will natively (and by default) will use samplers with similar speed ups, as those show in the example notebook. And eventually we will run Bambi on top of that.

@aloctavodia The link you shared does not work anymore. Can you please share again?
@zwelitunyiswa Can you share an example of how you used JAX sampling with PYMC3?
Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants