Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static llm pipeline dynamic shape model #1240

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

AsyaPronina
Copy link
Contributor

@AsyaPronina AsyaPronina commented Nov 20, 2024

@github-actions github-actions bot added category: LLM LLM pipeline (stateful, static) category: samples GenAI samples labels Nov 20, 2024
@AsyaPronina AsyaPronina marked this pull request as draft November 20, 2024 19:26
Comment on lines 806 to 784
int64_t position_ids_data = prompt_len -1;
std::vector<int64_t> attention_mask_data(1, prompt_len);
int64_t position_ids_data = prompt_len - 1;
std::vector<int64_t> attention_mask_data(prompt_len - 1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, @TolyaTalamanov !!

@AsyaPronina AsyaPronina force-pushed the at/static-llm-pipeline-dynamic-shape-model branch from 6cdd518 to cc34616 Compare November 27, 2024 15:41
@@ -10,7 +10,7 @@ int main(int argc, char* argv[]) try {
std::string prompt;
std::string models_path = argv[1];

std::string device = "CPU"; // GPU, NPU can be used as well
std::string device = "NPU"; // GPU, NPU can be used as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the default device should remain CPU

@@ -472,7 +480,7 @@ std::optional<NPUDesc> extract_npu_descriptor(ov::Core& core) {
ov::AnyMap get_baseline_common_config() {
ov::AnyMap config = {
{ "NPU_COMPILATION_MODE_PARAMS", "compute-layers-with-higher-precision=Sqrt,Power,ReduceMean,Add_RMSNorm" },
{ "NPUW_DEVICES", "NPU" },
{ "NPUW_DEVICES", "NPU,CPU" },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be changed I believe

}

ov::genai::TokenizedInputs tokenized_input;
//if (m_is_chat_conversation) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it commented out?

DecodedResults decoded_results = {m_tokenizer.decode(encoded_results.tokens), encoded_results.scores};
auto decode_stop_time = std::chrono::steady_clock::now();

//if (m_is_chat_conversation) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it commented out?

}

template <typename T>
void print_tensor(ov::Tensor t) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed

const ov::genai::Tokenizer& tokenizer,
const std::string& device,
const ov::AnyMap& config) {
//return std::make_unique<StaticLLMPipeline>(models_path, tokenizer, device, config);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: LLM LLM pipeline (stateful, static) category: samples GenAI samples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants