-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Original IDs of the retrieved documents #13
Comments
The |
Both, |
docid is internal information - its 0..N-1 for an index of N documents. Why not add a post-retrieval transformer that gets the additional metadata you need from IRDS again. |
I understand that you can use |
Keep the mapping you want in a dataframe and join it for each query? |
Hi,
I am trying to use ANCEIndexer to index several datasets at once. I created one iter of dicts for three collections using itertools.chain:
where
wapo_generator
is my own iterable of dicts that have the keys "docno", "docid" (which is an original id of the document in the collection), and "text". The index is created and I'm able to perform a search. Now, I would like to get the original ids of the retrieved documents (the ones from original collections, e.g. "MARCO_D820886'). Is there any way to do that?@cmacdonald @seanmacavaney @tonellotto @Xiao0728
The text was updated successfully, but these errors were encountered: