Reproducibility of Table 3 #5

SilvioGiancola · 2019-07-17T09:01:45Z

Hi,

I am running your code several time on the DD dataset and obtain different results than the one you present on Table 3 of your paper.
In particular I run 20 times this experiment, estimate the average and std but find out that the training is very random, with the std rising up to +-10%. Note that I used the same hyperparameters you provide in your paper and your google sheet (see issue #2).
I also tried with the ReduceLROnPlateau scheduler for the LR, but still have an std up to 5%.
How did you select your seed and how come there is such variation?

Thank you for your support,

Best,

ThyrixYang · 2019-10-14T04:58:08Z

I can confirm with @SilvioGiancola that the variance is much larger than results on google sheet, by using hyperparameters as suggested on the dataset D&D. And the mean accuracy seems not that high.

ThyrixYang · 2019-10-15T13:31:20Z

Hi @SilvioGiancola , they said in their paper that

In our experiments, we evaluated the pooling methods over 20 random seeds using 10-fold cross validation. A total of 200 testing results were used to obtain the final accuracy of each method on each dataset

So we evaluate by run CV (the main.py file) 10 times, calculate their mean, take this as one result, and repeat this procedure 20 times.
I hope this helps.

SilvioGiancola · 2019-10-20T14:51:13Z

Hi @ThyrixYang , thank you for sharing this detail!

In that case, it's not exactly like running 10 times the code as the main.py is not doing 10-fold cross validation but random splitting.

With this averaging over 10 runs, did you get a variance similar than one in Table 3?

ThyrixYang · 2019-10-20T14:58:38Z

@SilvioGiancola yes, the variance is similar, although its mean is a bit lower.

SilvioGiancola · 2019-10-22T19:08:48Z

@ThyrixYang I solved the variance issue with this 10-fold cross validation.

Although, when I reproduce their results, I am getting 10% lower than what they claim on the DD dataset, using the global pooling model. Are you also having such a big difference in your results?

I wished the authors could provide a code to reproduce their results. It is impossible to build upon them...

ThyrixYang · 2019-10-23T10:54:13Z

@SilvioGiancola Are you doing exactly 10-fold CV?
I remember on the random CV that it's about 2~3% lower, not 10%
I'm working on a paper about graph pooling now, but not based mainly on this paper, I will do more experiments later, maybe we can share some results then.

SilvioGiancola · 2019-10-23T13:06:20Z

@ThyrixYang I'll be more than happy to share some results with you on this baseline.
I guess I am performing the 10-fold CV properly, in particular:

I randomly split the dataset (DD in my case) in 10 folds of same length (last fold has a slight different length)
I use 9 folds for training, the 10th for testing
I repeat the 9-fold training 10 times, segregating a different fold for the testing at each time
I average the testing performance over the 10 folds -> this gives me the results for 1 run
I repeat steps 1-4 for 20 times, with 20 different random splits for the folds
I estimate the average and the std over the 20 runs

I get 65.1 ± 1.23 on DD using the global pooling setting, while the paper claim 76.19 ± 0.94. BTW I tried both the SAGPool implementations from this repo and from pytorch-geometric, with similar results. I also used the hyper parameters from the gsheet (lr=0.005, nhid=128, weight decay=0.00001)

Are you doing anything different for the 10 fold CV? Have you tried the same dataset or a different one?

jiaruHithub · 2020-06-19T10:10:36Z

@ThyrixYang I'll be more than happy to share some results with you on this baseline.
I guess I am performing the 10-fold CV properly, in particular:

I randomly split the dataset (DD in my case) in 10 folds of same length (last fold has a slight different length)

I use 9 folds for training, the 10th for testing

I repeat the 9-fold training 10 times, segregating a different fold for the testing at each time

I average the testing performance over the 10 folds -> this gives me the results for 1 run

I repeat steps 1-4 for 20 times, with 20 different random splits for the folds

I estimate the average and the std over the 20 runs

I get 65.1 ± 1.23 on DD using the global pooling setting, while the paper claim 76.19 ± 0.94. BTW I tried both the SAGPool implementations from this repo and from pytorch-geometric, with similar results. I also used the hyper parameters from the gsheet (lr=0.005, nhid=128, weight decay=0.00001)

Are you doing anything different for the 10 fold CV? Have you tried the same dataset or a different one?

Hi, I tried to reproduce the experiment too.
How is your replication of the experiment now?
I am looking forward to your reply. I am very interested in it
Thank you!

Abelpzx · 2021-07-31T12:06:13Z

Hi, each time I run this code, I got different results. I have set these seeds but still got different results.

torch.cuda.manual_seed(12345)  
torch.cuda.manual_seed_all(12345)  
random.seed(12345)
np.random.seed(12345)
#torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

What can I do to get the sample after running this code each time?
I am looking forward to your reply.
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility of Table 3 #5

Reproducibility of Table 3 #5

SilvioGiancola commented Jul 17, 2019

ThyrixYang commented Oct 14, 2019 •

edited

Loading

ThyrixYang commented Oct 15, 2019

SilvioGiancola commented Oct 20, 2019

ThyrixYang commented Oct 20, 2019

SilvioGiancola commented Oct 22, 2019 •

edited

Loading

ThyrixYang commented Oct 23, 2019 •

edited

Loading

SilvioGiancola commented Oct 23, 2019

jiaruHithub commented Jun 19, 2020

Abelpzx commented Jul 31, 2021

Reproducibility of Table 3 #5

Reproducibility of Table 3 #5

Comments

SilvioGiancola commented Jul 17, 2019

ThyrixYang commented Oct 14, 2019 • edited Loading

ThyrixYang commented Oct 15, 2019

SilvioGiancola commented Oct 20, 2019

ThyrixYang commented Oct 20, 2019

SilvioGiancola commented Oct 22, 2019 • edited Loading

ThyrixYang commented Oct 23, 2019 • edited Loading

SilvioGiancola commented Oct 23, 2019

jiaruHithub commented Jun 19, 2020

Abelpzx commented Jul 31, 2021

ThyrixYang commented Oct 14, 2019 •

edited

Loading

SilvioGiancola commented Oct 22, 2019 •

edited

Loading

ThyrixYang commented Oct 23, 2019 •

edited

Loading