Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove unnecessary tokenizer options from to_svector functions #20

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

jwnz
Copy link
Contributor

@jwnz jwnz commented Oct 10, 2024

When building the bm25 matrix, we store the tokenizer options. We can reduce some verbosity by referring to these in the bm25_document_to_svector and bm25_query_to_svector functions.

as is:

SELECT bm25_create('documents', 'passage', 'documents_passage_bm25', 'hf', 'google-bert/bert-base-uncased', 0.75, 1.2);

SELECT bm25_document_to_svector('documents_passage_bm25', 'requiring error for due process claim', 'hf', 'google-bert/bert-base-uncased');
SELECT bm25_query_to_svector('documents_passage_bm25', 'requiring error for due process claim', 'hf', 'google-bert/bert-base-uncased');

to be:

SELECT bm25_create('documents', 'passage', 'documents_passage_bm25', 'hf', 'google-bert/bert-base-uncased', 0.75, 1.2);

SELECT bm25_document_to_svector('documents_passage_bm25', 'requiring error for due process claim');
SELECT bm25_query_to_svector('documents_passage_bm25', 'requiring error for due process claim');

@VoVAllen VoVAllen requested a review from usamoi October 11, 2024 11:34
@VoVAllen
Copy link
Member

@usamoi PTAL

@usamoi usamoi merged commit 2924f67 into tensorchord:main Oct 11, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants