-
Notifications
You must be signed in to change notification settings - Fork 80
Reproducibility of Table 3 #5
Comments
I can confirm with @SilvioGiancola that the variance is much larger than results on google sheet, by using hyperparameters as suggested on the dataset D&D. And the mean accuracy seems not that high. |
Hi @SilvioGiancola , they said in their paper that
So we evaluate by run CV (the main.py file) 10 times, calculate their mean, take this as one result, and repeat this procedure 20 times. |
Hi @ThyrixYang , thank you for sharing this detail! In that case, it's not exactly like running 10 times the code as the main.py is not doing 10-fold cross validation but random splitting. With this averaging over 10 runs, did you get a variance similar than one in Table 3? |
@SilvioGiancola yes, the variance is similar, although its mean is a bit lower. |
@ThyrixYang I solved the variance issue with this 10-fold cross validation. Although, when I reproduce their results, I am getting 10% lower than what they claim on the DD dataset, using the global pooling model. Are you also having such a big difference in your results? I wished the authors could provide a code to reproduce their results. It is impossible to build upon them... |
@SilvioGiancola Are you doing exactly 10-fold CV? |
@ThyrixYang I'll be more than happy to share some results with you on this baseline.
I get Are you doing anything different for the 10 fold CV? Have you tried the same dataset or a different one? |
Hi, I tried to reproduce the experiment too. |
Hi, each time I run this code, I got different results. I have set these seeds but still got different results.
What can I do to get the sample after running this code each time? |
Hi,
I am running your code several time on the DD dataset and obtain different results than the one you present on Table 3 of your paper.
In particular I run 20 times this experiment, estimate the average and std but find out that the training is very random, with the std rising up to +-10%. Note that I used the same hyperparameters you provide in your paper and your google sheet (see issue #2).
I also tried with the ReduceLROnPlateau scheduler for the LR, but still have an std up to 5%.
How did you select your seed and how come there is such variation?
Thank you for your support,
Best,
The text was updated successfully, but these errors were encountered: