You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Kien, I am currently developing the same Chatbot as catesearch, when I look at your source I see that you have a risk in your context trimming code.
You are checking the length of context by split word but the length of words is not the same as the length of tokens and because GPT-3 uses the byte-pair encoding architecture to encode tokens, so your algorithm may stuck in some rare language such as Vietnamese if these are in list_paragraph[:2]. I recommend you use the GPT2TokenizerFast(https://huggingface.co/docs/transformers/model_doc/gpt2) to check the length of tokens instead of split.
The text was updated successfully, but these errors were encountered:
Hi Kien, I am currently developing the same Chatbot as catesearch, when I look at your source I see that you have a risk in your context trimming code.
You are checking the length of context by split word but the length of words is not the same as the length of tokens and because GPT-3 uses the byte-pair encoding architecture to encode tokens, so your algorithm may stuck in some rare language such as Vietnamese if these are in list_paragraph[:2]. I recommend you use the GPT2TokenizerFast(https://huggingface.co/docs/transformers/model_doc/gpt2) to check the length of tokens instead of split.
The text was updated successfully, but these errors were encountered: