the new version #2

Shijie-Xia · 2024-01-19T12:25:58Z

Hi! When will the dataset become stable? My paper, submitted to an upcoming AI conference, used the dataset. However, today I learned that it has been renamed and cleaned. Could you also update the results of the GPT-4 evaluation? I'm too lazy to test it myself. LOL

Randolph-zeng · 2024-01-20T10:35:23Z

Hi, thanks for the interest in the dataset! The new version mainly introduced a customized metrics called MR-Score to unify the three metrics in the three sub-tasks( solution correctness, error step, error reason). I have also cleaned up a little bit of the evaluation results. The update should be reflected in the arxiv soon. However there has not been much change in the dataset except for the renaming (the renaming is more for consistency consideration for the future expansion to more difficult datasets). Therefore, please rest in sure that there won't be any major updates on the dataset ( like re-naming, big cleaning etc) in the near future.

I will leave this issue open and update the readme about this at the same time in case there is any potential confusion and cocerns. Thanks !

Shijie-Xia · 2024-01-22T09:39:10Z

Thank you for your patience! I am confident that the MR-GSM8K will play a crucial role in advancing AGI. I have a question regarding the results mentioned in the README file: does GPT3.5 refer to gpt-3.5-turbo-1106, and GPT4 refer to gpt-4-1106-preview? I want to ensure accurate version descriptions when using your data. See the OpenAI website.

Randolph-zeng · 2024-01-23T01:53:25Z

Hi shijie, thanks for your kind words! Regarding your question, they are outlined in the section 4.1 of the paper. The APIs I am using are d GPT3-5-turbo0613, Claude2.0, GPT4-0613 since the experiment was conducted before November and the turbo-1106 was not available at that time. However, the auto eval is indeed utilizing the latest turbo-1106 version.

Btw, the paper is updated in the arxiv already in case you would love to check it out, the auto eval is discussed more extensively in appendix B : )
https://arxiv.org/pdf/2312.17080.pdf

Shijie-Xia · 2024-04-08T12:45:15Z

Thank you for your reminder! I want to cite your paper and use the BibTeX format you provided, but I noticed that it doesn't show the arXiv identifier in the reference. Perhaps this is because you uploaded two versions and changed the name? Regardless, I used the reference provided in your GitHub repository.

Randolph-zeng · 2024-04-08T15:22:02Z

Hi Shijie:
Thank you for your kind feedback ! It is really kind of you to raise this issue to us. I double checked my BibTex and consulted with GPT4 regarding the renaming issue, it seems to me that as long as the eprint field is correctly set ( e.g. arXiv:2312.17080) then you should be able to reference the paper just fine. However, it might really depends on your citation style that you are using. I just updated the BibTex in the readme and it seems work fine under ACL template. Do you mind taking a second look to see if the latest update works for you ?
Thanks a lot again for your kind support and wish you have a nice day!

Shijie-Xia · 2024-04-08T16:30:44Z

Wow, thanks for your timely feedback. The new BibTex works with my template! I will update it.

Shijie-Xia · 2024-04-08T16:43:59Z

Oh! I've noticed one thing. Did you miss an author in the new version? There are four authors for the new version but five for the previous one. I've added the missing author directly because I must submit the preprint version of my paper before 2:00 AM Beijing time for it to be published today!

Shijie-Xia · 2024-04-08T16:52:18Z

This is the version in my paper. I have checked all the references twice to ensure there are no mistakes. I hope this is correct for you.

Randolph-zeng · 2024-04-09T05:48:55Z

Yes, this looks perfect to me! Wish the best for your paper. May you have a smooth reviewing in any conference you submit. Good luck and thanks for your feedback : )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the new version #2

the new version #2

Shijie-Xia commented Jan 19, 2024

Randolph-zeng commented Jan 20, 2024

Shijie-Xia commented Jan 22, 2024

Randolph-zeng commented Jan 23, 2024

Shijie-Xia commented Apr 8, 2024

Randolph-zeng commented Apr 8, 2024

Shijie-Xia commented Apr 8, 2024

Shijie-Xia commented Apr 8, 2024

Shijie-Xia commented Apr 8, 2024

Randolph-zeng commented Apr 9, 2024

the new version #2

the new version #2

Comments

Shijie-Xia commented Jan 19, 2024

Randolph-zeng commented Jan 20, 2024

Shijie-Xia commented Jan 22, 2024

Randolph-zeng commented Jan 23, 2024

Shijie-Xia commented Apr 8, 2024

Randolph-zeng commented Apr 8, 2024

Shijie-Xia commented Apr 8, 2024

Shijie-Xia commented Apr 8, 2024

Shijie-Xia commented Apr 8, 2024

Randolph-zeng commented Apr 9, 2024