Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move beam search in case of chat scenario to sampler.cpp #1215

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Nov 14, 2024

Task CVS-156578

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms labels Nov 14, 2024
src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/lm_encoding.cpp Outdated Show resolved Hide resolved
src/cpp/src/lm_encoding.cpp Outdated Show resolved Hide resolved
src/cpp/src/lm_encoding.cpp Outdated Show resolved Hide resolved
src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/utils.cpp Outdated Show resolved Hide resolved
src/cpp/src/visual_language/pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
@sbalandi
Copy link
Contributor Author

the tests are passed, but it still doesn't work for bigger max_token size for TinyLlama-1.1B-Chat-v1.0 . Tokenizer converts first token from symbol to _symbol. So please, do not review.

src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/llm_pipeline.cpp Show resolved Hide resolved
}
m_templated_chat_history = new_templated_chat_history;
m_tokenized_chat_history = new_chat_tokens;
// TODO: Forbid LoRA config change if we are in the chat mode, because it requires regenerating the history with LoRA applied
} else {
encoded_input = m_tokenizer.encode(prompt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that m_templated_chat_history.append(answer); is valid for all chat templates, because between current history and assistant answer we can have some tokens / words (e.g. ' ' in example below):

{% set content = message['content'] %}
{% endif %}
{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}
{% elif message['role'] == 'assistant' %}{{ ' '  + content.strip() + ' ' + eos_token }}

Let's discuss it on GenAI meeting.


if (is_chat_conversation) {
m_history_available = true;
m_last_disappeared_token = result.tokens[0].back();
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose EOS can also be added here, but it's OK for such token to disappear

can we extract from sampler the reason of sequence finishing and add last token only iff it's ended by not EOS condition ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: visual language Visual language pipeline no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants