Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the new version #2

Open
Shijie-Xia opened this issue Jan 19, 2024 · 9 comments
Open

the new version #2

Shijie-Xia opened this issue Jan 19, 2024 · 9 comments

Comments

@Shijie-Xia
Copy link

Hi! When will the dataset become stable? My paper, submitted to an upcoming AI conference, used the dataset. However, today I learned that it has been renamed and cleaned. Could you also update the results of the GPT-4 evaluation? I'm too lazy to test it myself. LOL

@Randolph-zeng
Copy link
Member

Hi, thanks for the interest in the dataset! The new version mainly introduced a customized metrics called MR-Score to unify the three metrics in the three sub-tasks( solution correctness, error step, error reason). I have also cleaned up a little bit of the evaluation results. The update should be reflected in the arxiv soon. However there has not been much change in the dataset except for the renaming (the renaming is more for consistency consideration for the future expansion to more difficult datasets). Therefore, please rest in sure that there won't be any major updates on the dataset ( like re-naming, big cleaning etc) in the near future.

I will leave this issue open and update the readme about this at the same time in case there is any potential confusion and cocerns. Thanks !

@Shijie-Xia
Copy link
Author

Thank you for your patience! I am confident that the MR-GSM8K will play a crucial role in advancing AGI. I have a question regarding the results mentioned in the README file: does GPT3.5 refer to gpt-3.5-turbo-1106, and GPT4 refer to gpt-4-1106-preview? I want to ensure accurate version descriptions when using your data. See the OpenAI website.

@Randolph-zeng
Copy link
Member

Hi shijie, thanks for your kind words! Regarding your question, they are outlined in the section 4.1 of the paper. The APIs I am using are d GPT3-5-turbo0613, Claude2.0, GPT4-0613 since the experiment was conducted before November and the turbo-1106 was not available at that time. However, the auto eval is indeed utilizing the latest turbo-1106 version.

Btw, the paper is updated in the arxiv already in case you would love to check it out, the auto eval is discussed more extensively in appendix B : )
https://arxiv.org/pdf/2312.17080.pdf

@Shijie-Xia
Copy link
Author

Thank you for your reminder! I want to cite your paper and use the BibTeX format you provided, but I noticed that it doesn't show the arXiv identifier in the reference. Perhaps this is because you uploaded two versions and changed the name? Regardless, I used the reference provided in your GitHub repository.

@Randolph-zeng
Copy link
Member

Hi Shijie:
Thank you for your kind feedback ! It is really kind of you to raise this issue to us. I double checked my BibTex and consulted with GPT4 regarding the renaming issue, it seems to me that as long as the eprint field is correctly set ( e.g. arXiv:2312.17080) then you should be able to reference the paper just fine. However, it might really depends on your citation style that you are using. I just updated the BibTex in the readme and it seems work fine under ACL template. Do you mind taking a second look to see if the latest update works for you ?
Thanks a lot again for your kind support and wish you have a nice day!

@Shijie-Xia
Copy link
Author

Wow, thanks for your timely feedback. The new BibTex works with my template! I will update it.

@Shijie-Xia
Copy link
Author

Oh! I've noticed one thing. Did you miss an author in the new version? There are four authors for the new version but five for the previous one. I've added the missing author directly because I must submit the preprint version of my paper before 2:00 AM Beijing time for it to be published today!

@Shijie-Xia
Copy link
Author

屏幕截图 2024-04-09 004802 This is the version in my paper. I have checked all the references twice to ensure there are no mistakes. I hope this is correct for you.

@Randolph-zeng
Copy link
Member

Yes, this looks perfect to me! Wish the best for your paper. May you have a smooth reviewing in any conference you submit. Good luck and thanks for your feedback : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants