Move beam search in case of chat scenario to sampler.cpp #1215

sbalandi · 2024-11-14T13:42:31Z

Task CVS-156578

spread length penalty to multinomial (Length Penalty only affects beam search #1143)

src/cpp/src/llm_pipeline.cpp

src/cpp/src/lm_encoding.cpp

src/cpp/src/llm_pipeline.cpp

src/cpp/src/utils.cpp

src/cpp/src/visual_language/pipeline.cpp

src/cpp/src/llm_pipeline.cpp

src/cpp/src/lm_encoding.cpp

sbalandi · 2024-11-20T20:18:49Z

the tests are passed, but it still doesn't work for bigger max_token size for TinyLlama-1.1B-Chat-v1.0 . Tokenizer converts first token from symbol to _symbol. So please, do not review.

src/cpp/src/llm_pipeline.cpp

- spread lenght penalty to multinomial

src/cpp/src/llm_pipeline.cpp

ilya-lavrenov · 2024-11-21T19:59:22Z

src/cpp/src/llm_pipeline.cpp

                }
                m_templated_chat_history = new_templated_chat_history;
-                m_tokenized_chat_history = new_chat_tokens;
                // TODO: Forbid LoRA config change if we are in the chat mode, because it requires regenerating the history with LoRA applied
            } else {
                encoded_input = m_tokenizer.encode(prompt);


I'm not sure that m_templated_chat_history.append(answer); is valid for all chat templates, because between current history and assistant answer we can have some tokens / words (e.g. ' ' in example below):

{% set content = message['content'] %} {% endif %} {% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }} {% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}

Let's discuss it on GenAI meeting.

ilya-lavrenov · 2024-11-27T11:19:42Z

src/cpp/src/llm_pipeline.cpp

+
+        if (is_chat_conversation) {
+            m_history_available = true;
+            m_last_disappeared_token = result.tokens[0].back();


I suppose EOS can also be added here, but it's OK for such token to disappear

can we extract from sampler the reason of sequence finishing and add last token only iff it's ended by not EOS condition ?

github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms labels Nov 14, 2024

ilya-lavrenov reviewed Nov 14, 2024

View reviewed changes

src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved

sbalandi force-pushed the beam branch from d213acc to c7b9843 Compare November 14, 2024 18:08

sbalandi marked this pull request as ready for review November 14, 2024 18:11

sbalandi requested a review from Wovchena November 14, 2024 18:11

ilya-lavrenov added this to the 2025.0 milestone Nov 15, 2024

ilya-lavrenov assigned ilya-lavrenov and Wovchena Nov 15, 2024

Wovchena requested changes Nov 15, 2024

View reviewed changes

ilya-lavrenov reviewed Nov 15, 2024

View reviewed changes

ilya-lavrenov mentioned this pull request Nov 15, 2024

Fix wrong logits processing without applying of slice matmul #1217

Merged

sbalandi force-pushed the beam branch from 11ac0a2 to dc8fa68 Compare November 18, 2024 18:32

Wovchena reviewed Nov 19, 2024

View reviewed changes

src/cpp/src/lm_encoding.cpp Outdated Show resolved Hide resolved

sbalandi force-pushed the beam branch from dc8fa68 to b55f830 Compare November 20, 2024 19:02

github-actions bot added no-match-files and removed category: sampling Sampling / Decoding algorithms labels Nov 20, 2024

sbalandi force-pushed the beam branch from b55f830 to 53ed750 Compare November 20, 2024 19:05

sbalandi added do_not_review and removed do_not_review labels Nov 20, 2024

ilya-lavrenov reviewed Nov 21, 2024

View reviewed changes

src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved

sbalandi force-pushed the beam branch from 53ed750 to a2dbeba Compare November 21, 2024 17:06

sbalandi added 2 commits November 21, 2024 17:45

Move beam search in case of chat scenario to sampler.cpp

55469e8

- spread lenght penalty to multinomial

update

a2dbeba

ilya-lavrenov reviewed Nov 21, 2024

View reviewed changes

sbalandi force-pushed the beam branch 2 times, most recently from 815016b to f68e9db Compare November 25, 2024 10:35

sbalandi force-pushed the beam branch from f68e9db to 6f9335e Compare November 25, 2024 10:41

update

6f9335e

ilya-lavrenov reviewed Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move beam search in case of chat scenario to sampler.cpp #1215

Move beam search in case of chat scenario to sampler.cpp #1215

sbalandi commented Nov 14, 2024 •

edited

Loading

sbalandi commented Nov 20, 2024

ilya-lavrenov Nov 21, 2024

ilya-lavrenov Nov 27, 2024 •

edited

Loading

Move beam search in case of chat scenario to sampler.cpp #1215

Are you sure you want to change the base?

Move beam search in case of chat scenario to sampler.cpp #1215

Conversation

sbalandi commented Nov 14, 2024 • edited Loading

sbalandi commented Nov 20, 2024

ilya-lavrenov Nov 21, 2024

Choose a reason for hiding this comment

ilya-lavrenov Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

sbalandi commented Nov 14, 2024 •

edited

Loading

ilya-lavrenov Nov 27, 2024 •

edited

Loading