[RISK] Trim Context Algorithm Issue #1

toreleon · 2023-02-21T02:57:19Z

Hi Kien, I am currently developing the same Chatbot as catesearch, when I look at your source I see that you have a risk in your context trimming code.

You are checking the length of context by split word but the length of words is not the same as the length of tokens and because GPT-3 uses the byte-pair encoding architecture to encode tokens, so your algorithm may stuck in some rare language such as Vietnamese if these are in list_paragraph[:2]. I recommend you use the GPT2TokenizerFast(https://huggingface.co/docs/transformers/model_doc/gpt2) to check the length of tokens instead of split.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISK] Trim Context Algorithm Issue #1

[RISK] Trim Context Algorithm Issue #1

toreleon commented Feb 21, 2023

[RISK] Trim Context Algorithm Issue #1

[RISK] Trim Context Algorithm Issue #1

Comments

toreleon commented Feb 21, 2023