-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the new version #2
Comments
Hi, thanks for the interest in the dataset! The new version mainly introduced a customized metrics called MR-Score to unify the three metrics in the three sub-tasks( solution correctness, error step, error reason). I have also cleaned up a little bit of the evaluation results. The update should be reflected in the arxiv soon. However there has not been much change in the dataset except for the renaming (the renaming is more for consistency consideration for the future expansion to more difficult datasets). Therefore, please rest in sure that there won't be any major updates on the dataset ( like re-naming, big cleaning etc) in the near future. I will leave this issue open and update the readme about this at the same time in case there is any potential confusion and cocerns. Thanks ! |
Thank you for your patience! I am confident that the MR-GSM8K will play a crucial role in advancing AGI. I have a question regarding the results mentioned in the README file: does GPT3.5 refer to gpt-3.5-turbo-1106, and GPT4 refer to gpt-4-1106-preview? I want to ensure accurate version descriptions when using your data. See the OpenAI website. |
Hi shijie, thanks for your kind words! Regarding your question, they are outlined in the section 4.1 of the paper. The APIs I am using are d GPT3-5-turbo0613, Claude2.0, GPT4-0613 since the experiment was conducted before November and the turbo-1106 was not available at that time. However, the auto eval is indeed utilizing the latest turbo-1106 version. Btw, the paper is updated in the arxiv already in case you would love to check it out, the auto eval is discussed more extensively in appendix B : ) |
Thank you for your reminder! I want to cite your paper and use the BibTeX format you provided, but I noticed that it doesn't show the arXiv identifier in the reference. Perhaps this is because you uploaded two versions and changed the name? Regardless, I used the reference provided in your GitHub repository. |
Hi Shijie: |
Wow, thanks for your timely feedback. The new BibTex works with my template! I will update it. |
Oh! I've noticed one thing. Did you miss an author in the new version? There are four authors for the new version but five for the previous one. I've added the missing author directly because I must submit the preprint version of my paper before 2:00 AM Beijing time for it to be published today! |
Yes, this looks perfect to me! Wish the best for your paper. May you have a smooth reviewing in any conference you submit. Good luck and thanks for your feedback : ) |
Hi! When will the dataset become stable? My paper, submitted to an upcoming AI conference, used the dataset. However, today I learned that it has been renamed and cleaned. Could you also update the results of the GPT-4 evaluation? I'm too lazy to test it myself. LOL
The text was updated successfully, but these errors were encountered: