-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move beam search in case of chat scenario to sampler.cpp #1215
base: master
Are you sure you want to change the base?
Conversation
the tests are passed, but it still doesn't work for bigger max_token size for TinyLlama-1.1B-Chat-v1.0 . Tokenizer converts first token from |
- spread lenght penalty to multinomial
} | ||
m_templated_chat_history = new_templated_chat_history; | ||
m_tokenized_chat_history = new_chat_tokens; | ||
// TODO: Forbid LoRA config change if we are in the chat mode, because it requires regenerating the history with LoRA applied | ||
} else { | ||
encoded_input = m_tokenizer.encode(prompt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that m_templated_chat_history.append(answer);
is valid for all chat templates, because between current history and assistant answer we can have some tokens / words (e.g. ' ' in example below):
{% set content = message['content'] %}
{% endif %}
{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}
{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}
Let's discuss it on GenAI meeting.
815016b
to
f68e9db
Compare
|
||
if (is_chat_conversation) { | ||
m_history_available = true; | ||
m_last_disappeared_token = result.tokens[0].back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose EOS can also be added here, but it's OK for such token to disappear
can we extract from sampler the reason of sequence finishing and add last token only iff it's ended by not EOS condition ?
Task CVS-156578