Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan for including LLMs' Zero-shot performance? #132

Open
Leezekun opened this issue Jun 30, 2024 · 1 comment
Open

Plan for including LLMs' Zero-shot performance? #132

Leezekun opened this issue Jun 30, 2024 · 1 comment

Comments

@Leezekun
Copy link

Hi,

Thank you for the great work!

Given the current prevalence of Large Language Models (LLMs), are there any plans to include more LLM-based approaches in performance evaluations, especially focusing on zero-shot performance?

Here are a few relevant papers and approaches:

  • Hu, Yushi, et al. "In-context learning for few-shot dialogue state tracking." arXiv preprint arXiv:2203.08568 (2022).
  • Hudeček, Vojtěch, and Ondřej Dušek. "Are LLMs all you need for task-oriented dialogue?" arXiv preprint arXiv:2304.06556 (2023).
  • Heck, Michael, et al. "ChatGPT for zero-shot dialogue state tracking: A solution or an opportunity?" arXiv preprint arXiv:2306.01386 (2023).
  • Chung, Willy, et al. "Instructtods: Large language models for end-to-end task-oriented dialogue systems." arXiv preprint arXiv:2310.08885 (2023).
  • Li, Zekun, et al. "Large Language Models as Zero-shot Dialogue State Trackers through Function Calling." arXiv preprint arXiv:2402.10466 (2024).

Are there any plans to benchmark the performance of LLMs in zero-shot settings? I would be happy to assist with this if needed.

@budzianowski
Copy link
Owner

Hi @Leezekun - thanks for posting this. A simple answer is - absolutely! There are numbers of efforts to work in a zero-shot manner. If you are happy to update the benchmarks that would be very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants