Different performance when reproducing your example #1

millengustavo · 2020-05-25T13:42:29Z

Hello, first of all, thanks for making the code available!

I am having trouble reproducing your example. As you can see in this notebook https://github.com/millengustavo/causality/blob/master/examples/causal_diagrams.ipynb, I copied the definition code of the classes present in init.py from your repository and tried to apply it in the same dataset generated by fklearn. However, I obtained different results in the "observational" data. The performance was quite different.

Could you point me to the reason? I tested with different sample sizes without success.

gdmarmerola · 2020-09-29T03:40:16Z

Cool! Thanks for sharing!

Sorry for the delay. I did a major update on the lib now. I would advise actually installing it and get the most updated methods. There's a way to check if inference inside a leaf is biased now by testing if W is not random.

Nevertheless, this alone should not explain the performance decrease. What I noted is that ForestEmbeddingsCounterfactual had a decrease in performance overnight (current example shows that). I think they updated fklearn and changed how they generate their toy dataset.

DecisionTreeCounterfactual is working fine however. Did you try other hyperparams? If the df is small maybe decreasing min_samples_leaf would be a good idea

gdmarmerola · 2020-09-29T03:43:55Z

So, in summary

try with updated code
use DecisionTreeCounterfactual as it is simpler and more stable right now
try other tree hyperparams (mainly min_samples_leaf). In theory, improving CV would improve causal estimate as well

gdmarmerola · 2020-09-29T03:45:08Z

I noted an interesting result on your notebook: causalnex did not find the correct structure in this problem. I had a similar result also!!!

millengustavo · 2020-09-29T11:51:47Z

Hi @gdmarmerola thanks for the answer!

fklearn has indeed changed how they generate their toy dataset. It was a bug I found and they merged my PR nubank/fklearn#131. The outcome signal was positive instead of negative which flipped the results 180 degrees!

I will try your updated code / play with different parameters and I report to you soon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different performance when reproducing your example #1

Different performance when reproducing your example #1

millengustavo commented May 25, 2020

gdmarmerola commented Sep 29, 2020

gdmarmerola commented Sep 29, 2020

gdmarmerola commented Sep 29, 2020

millengustavo commented Sep 29, 2020

Different performance when reproducing your example #1

Different performance when reproducing your example #1

Comments

millengustavo commented May 25, 2020

gdmarmerola commented Sep 29, 2020

gdmarmerola commented Sep 29, 2020

gdmarmerola commented Sep 29, 2020

millengustavo commented Sep 29, 2020