StaticLLMPipeline: Use set_tensor for kvcache model #1106

TolyaTalamanov · 2024-10-30T13:18:45Z

No description provided.

…genai

TolyaTalamanov · 2024-10-30T13:19:00Z

src/cpp/src/llm_pipeline_static.cpp

@@ -704,6 +704,17 @@ EncodedResults StaticLLMPipeline::generate(
        position_ids_data[0] = m_kvcache_desc.num_stored_tokens;
        attention_mask_data[m_kvcache_desc.num_stored_tokens - 1] = 1u;

+        // NB: Write KV-cache for the new token to the correct input position for the next iteration


Comment should be changed

@TolyaTalamanov Does set_tensor now support remote tensor data sharing across CPU and NPU?

dmatveev · 2024-10-30T17:10:56Z

@TolyaTalamanov did it change anything?

TolyaTalamanov · 2024-11-01T09:59:38Z

@TolyaTalamanov did it change anything?

It broke accuracy, although on CPU it works fine. Will re-consider this optimization

TolyaTalamanov added 11 commits October 17, 2024 09:23

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

f7a63e6

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

f87b049

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

d584e5d

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

66e384c

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

614da55

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

3acec5b

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

2470613

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

e640af3

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

13ce329

…genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

3f318be

…genai

Use set_tensor for kvcache model instead of copy

69cc72d

TolyaTalamanov commented Oct 30, 2024

View reviewed changes

github-actions bot added category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms labels Oct 30, 2024

ilya-lavrenov assigned dmatveev Oct 30, 2024

TolyaTalamanov marked this pull request as draft November 1, 2024 09:59

ilya-lavrenov removed the category: sampling Sampling / Decoding algorithms label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StaticLLMPipeline: Use set_tensor for kvcache model #1106

StaticLLMPipeline: Use set_tensor for kvcache model #1106

TolyaTalamanov commented Oct 30, 2024

TolyaTalamanov Oct 30, 2024

soumendukrg Nov 1, 2024

dmatveev commented Oct 30, 2024

TolyaTalamanov commented Nov 1, 2024

StaticLLMPipeline: Use set_tensor for kvcache model #1106

Are you sure you want to change the base?

StaticLLMPipeline: Use set_tensor for kvcache model #1106

Conversation

TolyaTalamanov commented Oct 30, 2024

TolyaTalamanov Oct 30, 2024

Choose a reason for hiding this comment

soumendukrg Nov 1, 2024

Choose a reason for hiding this comment

dmatveev commented Oct 30, 2024

TolyaTalamanov commented Nov 1, 2024