Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StaticLLMPipeline: Use set_tensor for kvcache model #1106

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

TolyaTalamanov
Copy link
Collaborator

No description provided.

@@ -704,6 +704,17 @@ EncodedResults StaticLLMPipeline::generate(
position_ids_data[0] = m_kvcache_desc.num_stored_tokens;
attention_mask_data[m_kvcache_desc.num_stored_tokens - 1] = 1u;

// NB: Write KV-cache for the new token to the correct input position for the next iteration
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be changed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TolyaTalamanov Does set_tensor now support remote tensor data sharing across CPU and NPU?

@github-actions github-actions bot added category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms labels Oct 30, 2024
@dmatveev
Copy link
Contributor

@TolyaTalamanov did it change anything?

@TolyaTalamanov
Copy link
Collaborator Author

@TolyaTalamanov did it change anything?

It broke accuracy, although on CPU it works fine. Will re-consider this optimization

@TolyaTalamanov TolyaTalamanov marked this pull request as draft November 1, 2024 09:59
@ilya-lavrenov ilya-lavrenov removed the category: sampling Sampling / Decoding algorithms label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: LLM LLM pipeline (stateful, static)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants