You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.
I have tested the code and it works wonderfully. However, I am noticing that the tutorial only shows a large model and with massive entities.
config = {
"test_entities": None,
"test_mentions": None,
"interactive": False,
"top_k": 10,
"biencoder_model": models_path+"biencoder_wiki_large.bin",
"biencoder_config": models_path+"biencoder_wiki_large.json",
"entity_catalogue": models_path+"entity.jsonl",
"entity_encoding": models_path+"all_entities_large.t7",
"crossencoder_model": models_path+"crossencoder_wiki_large.bin",
"crossencoder_config": models_path+"crossencoder_wiki_large.json",
"fast": True, # set this to be true if speed is a concern
"output_path": "logs/" # logging directory
}
Is there a smaller pre-trained model on entity encoding that I can use to speed up the prediction? I am ok to sacrifice some performance. If it's not available, is there anything at all I could do to speed this up?
Thank you
The text was updated successfully, but these errors were encountered:
I have created a repository for data generation and training of bi-encoder models (so far, only for entity-linking) based on the BLINK model. In it, you can choose which bert base model to use to make your evaluation faster :). As I remember using a bert-mini I could get R@64 of 84% on zeshel dataset.*
But no cross-encoder was implemented, so, you can make only faster the bi-encoder part.
If you have your own training data, it's not hard to modify the code slightly to use the latest and smaller HuggingFace BERT models (such as BERT mini or google/bert_uncased_L-8_H-512_A-8) for training biencoder.
You'll need to change how the base model is loaded (use HuggingFace's AutoModel and AutoTokenizer classes) and how the tokenized input is fed to the model (input_ids, token_type_ids and attention_mask).
I tried training the Zeshel model after making these changes and training seemed to go on fine.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello BLINK team,
I have tested the code and it works wonderfully. However, I am noticing that the tutorial only shows a large model and with massive entities.
config = {
"test_entities": None,
"test_mentions": None,
"interactive": False,
"top_k": 10,
"biencoder_model": models_path+"biencoder_wiki_large.bin",
"biencoder_config": models_path+"biencoder_wiki_large.json",
"entity_catalogue": models_path+"entity.jsonl",
"entity_encoding": models_path+"all_entities_large.t7",
"crossencoder_model": models_path+"crossencoder_wiki_large.bin",
"crossencoder_config": models_path+"crossencoder_wiki_large.json",
"fast": True, # set this to be true if speed is a concern
"output_path": "logs/" # logging directory
}
Is there a smaller pre-trained model on entity encoding that I can use to speed up the prediction? I am ok to sacrifice some performance. If it's not available, is there anything at all I could do to speed this up?
Thank you
The text was updated successfully, but these errors were encountered: