-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Make deduplication in PairClassificationEvaluator stable #1315
base: v2.0.0
Are you sure you want to change the base?
fix: Make deduplication in PairClassificationEvaluator stable #1315
Conversation
task_name=self.task_name, | ||
prompt_type=PromptType.query, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PairClassification shouldn't have prompt_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it should be removed totally from evaluator. Also can you run some tasks to check scores?
18e0fdd
to
d4c8e72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR - looked it over. I Agree with @Samoed suggestions here. Otherwise, not much to add.
@tsirif would love to get this merged in do you have the time to work on it (otherwise it might be worth closing it or allow others to finish it up) |
I will do it tomorrow!
Στις Δευ 11 Νοε 2024, 4:21 π.μ. ο χρήστης Kenneth Enevoldsen <
***@***.***> έγραψε:
… @tsirif <https://github.com/tsirif> would love to get this merged in do
you have the time to work on it?
—
Reply to this email directly, view it on GitHub
<#1315 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7FKAIR54Y6XD7USWKNBUT2ABZLHAVCNFSM6AAAAABQP5NXCOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRXGYZTINZWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
What: Change the way sentence deduplication is made in
PairClassificationEvaluator
How: Use the same staticmethod as in
RerankingEvaluator
:_encode_unique_texts
Why: Compared to
list(set(sentences1 + sentences2))
, the function_encode_unique_texts
is definitely stable in the order that the deduplicated sentences are generated. This is crucial if the implementation ofmodel.encode
is parallelizing the work via DDP, in which case the splitting of sentences to chunks across DDP workers expects that the same exact list of sentences (order too!) is given to each call ofmodel.encode
...Checklist
make test
.make lint
.