Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-Answer Questions? #10

Open
nbalepur opened this issue Oct 7, 2024 · 3 comments
Open

Single-Answer Questions? #10

nbalepur opened this issue Oct 7, 2024 · 3 comments

Comments

@nbalepur
Copy link

nbalepur commented Oct 7, 2024

All of the examples shown in the paper imply that the questions have a single answer, but actually most of the questions are just several QA pairs combined into one (i.e. need to provide several answers to be correct).

Is this right? I don't find many examples like "What is the total number of employees in the five largest banks in the world?" that require synthesizing multiple sub-answers into a final answer

@zhudotexe
Copy link
Owner

Yes, that's right - not all of the questions are single-answer (in fact, I think most of them are not). In Appendix A of our paper most of our provided examples are questions whose answers are key-value pairs.

@nbalepur
Copy link
Author

nbalepur commented Oct 7, 2024

I wish this was clearer in the paper---Figure 1, Sec 4.2, and the use of "top-level answer" made me believe that the goal was to synthesize subanswers into a single answer (versus just string concatenation). If you're curious, only 24/334 of the devset questions have a single answer. Maybe this could be made clearer in the repo?

Thanks anyway, the work is interesting and it would be cool to have a version that does require combining and reasining over these subanswers into a single answer

@zhudotexe
Copy link
Owner

Gotcha. The main thing we wanted to focus on was the fan-out operation itself and how that requires multi-turn retrieval (in the Open Book setting at least); in another current work we're using this dataset to compare single-agent vs multi-agent systems which influenced some of the wording there. So there is indeed a stronger focus on the fan-out part rather than the fan-in/aggregation task (IMO, in a multi-agent system where in theory each subquery is handled by an individual agent, this reduces to a fairly standard reading comprehension/reasoning task ala GSM8K etc). This lets us isolate the challenges in the task easier but I agree that it would be more challenging if each question also involved an aggregation step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants