Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop_query_vec #24

Merged
merged 6 commits into from
Nov 23, 2024
Merged

drop_query_vec #24

merged 6 commits into from
Nov 23, 2024

Conversation

seanmacavaney
Copy link
Collaborator

@seanmacavaney seanmacavaney commented Nov 22, 2024

fixes #23

WIP:

  • np_retriever
  • faiss_flat_retriever
  • faiss_hnsw_retriever
  • faiss_ivf_retriever
  • scann_retriever
  • torch_retriever
  • voyager_retriever
  • gar
  • ladr

@seanmacavaney
Copy link
Collaborator Author

Verified that it copies the pointer not the value.

(Pdb) id(res.query_vec[0])
140657462180720
(Pdb) id(res.query_vec[1])
140657462180720
(Pdb) id(res.query_vec[999])
140657462180720
(Pdb) id(res.query_vec[1000])
140657462180816

@seanmacavaney
Copy link
Collaborator Author

R -> R ones, like gar and ladr, are tricker to pull all input columns through because we need to figure out which columns are the same across the query (e.g., 'query_vec') and which are specific to the retrieval result (e.g., 'text'). Something like:

qcols = {group[k].iloc[0] for k in group if group[k].nunique() == 1}

I wonder if this is something that DataFrameBuilder could support directly somehow. (Or perhaps the use case is too niche.)

@seanmacavaney
Copy link
Collaborator Author

@cmacdonald do the implementations so far meet your needs?

@cmacdonald cmacdonald marked this pull request as ready for review November 22, 2024 16:32
@cmacdonald
Copy link
Collaborator

LGTM

@cmacdonald
Copy link
Collaborator

cmacdonald commented Nov 22, 2024

R -> R ones, like gar and ladr, are tricker to pull all input columns through because we need to figure out which columns are the same across the query (e.g., 'query_vec') and which are specific to the retrieval result (e.g., 'text'). Something like:

qcols = {group[k].iloc[0] for k in group if group[k].nunique() == 1}

I wonder if this is something that DataFrameBuilder could support directly somehow. (Or perhaps the use case is too niche.)

The "adhoc rule" is that query columns (i.e. which should be common across a query) begin with a q.

There are some methods for detecting query columns (no these dont seem to implement the rule, per se):
https://github.com/terrier-org/pyterrier/blob/master/pyterrier/model.py#L50-L60

Other pieces of code have weird joins, e.g. https://github.com/terrier-org/pyterrier/blob/master/pyterrier/terrier/retriever.py#L440-L441 - at least some used pt.model.query_columnms: https://github.com/search?q=repo%3Aterrier-org%2Fpyterrier%20pt.model.query_columns&type=code

@cmacdonald
Copy link
Collaborator

I'd be tempted to merge this for now.

@seanmacavaney
Copy link
Collaborator Author

I'd rather be able to sort it completely instead of leaving it half-done. Implementations for the remaining ones are in progress.

@seanmacavaney seanmacavaney merged commit e4c6e81 into master Nov 23, 2024
2 checks passed
@seanmacavaney seanmacavaney deleted the drop_query_vec branch November 23, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

retrievers consume the query_vec
2 participants