Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Make deduplication in PairClassificationEvaluator stable #1315

Open
wants to merge 1 commit into
base: v2.0.0
Choose a base branch
from

Conversation

tsirif
Copy link

@tsirif tsirif commented Oct 24, 2024

What: Change the way sentence deduplication is made in PairClassificationEvaluator
How: Use the same staticmethod as in RerankingEvaluator: _encode_unique_texts
Why: Compared to list(set(sentences1 + sentences2)), the function _encode_unique_texts is definitely stable in the order that the deduplicated sentences are generated. This is crucial if the implementation of model.encode is parallelizing the work via DDP, in which case the splitting of sentences to chunks across DDP workers expects that the same exact list of sentences (order too!) is given to each call of model.encode...

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • [x ] Run the formatter to format the code using make lint.

task_name=self.task_name,
prompt_type=PromptType.query,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PairClassification shouldn't have prompt_type

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should be removed totally from evaluator. Also can you run some tasks to check scores?

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR - looked it over. I Agree with @Samoed suggestions here. Otherwise, not much to add.

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Nov 11, 2024

@tsirif would love to get this merged in do you have the time to work on it (otherwise it might be worth closing it or allow others to finish it up)

@KennethEnevoldsen KennethEnevoldsen changed the base branch from main to v2.0.0 November 11, 2024 09:29
@tsirif
Copy link
Author

tsirif commented Nov 11, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants