Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different performance when reproducing your example #1

Open
millengustavo opened this issue May 25, 2020 · 4 comments
Open

Different performance when reproducing your example #1

millengustavo opened this issue May 25, 2020 · 4 comments

Comments

@millengustavo
Copy link

Hello, first of all, thanks for making the code available!

I am having trouble reproducing your example. As you can see in this notebook https://github.com/millengustavo/causality/blob/master/examples/causal_diagrams.ipynb, I copied the definition code of the classes present in init.py from your repository and tried to apply it in the same dataset generated by fklearn. However, I obtained different results in the "observational" data. The performance was quite different.

Could you point me to the reason? I tested with different sample sizes without success.

@gdmarmerola
Copy link
Owner

Cool! Thanks for sharing!

Sorry for the delay. I did a major update on the lib now. I would advise actually installing it and get the most updated methods. There's a way to check if inference inside a leaf is biased now by testing if W is not random.

Nevertheless, this alone should not explain the performance decrease. What I noted is that ForestEmbeddingsCounterfactual had a decrease in performance overnight (current example shows that). I think they updated fklearn and changed how they generate their toy dataset.

DecisionTreeCounterfactual is working fine however. Did you try other hyperparams? If the df is small maybe decreasing min_samples_leaf would be a good idea

@gdmarmerola
Copy link
Owner

So, in summary

  1. try with updated code
  2. use DecisionTreeCounterfactual as it is simpler and more stable right now
  3. try other tree hyperparams (mainly min_samples_leaf). In theory, improving CV would improve causal estimate as well

@gdmarmerola
Copy link
Owner

I noted an interesting result on your notebook: causalnex did not find the correct structure in this problem. I had a similar result also!!!

@millengustavo
Copy link
Author

Hi @gdmarmerola thanks for the answer!

fklearn has indeed changed how they generate their toy dataset. It was a bug I found and they merged my PR nubank/fklearn#131. The outcome signal was positive instead of negative which flipped the results 180 degrees!

I will try your updated code / play with different parameters and I report to you soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants