-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static llm pipeline dynamic shape model #1240
base: master
Are you sure you want to change the base?
Static llm pipeline dynamic shape model #1240
Conversation
src/cpp/src/llm_pipeline_static.cpp
Outdated
int64_t position_ids_data = prompt_len -1; | ||
std::vector<int64_t> attention_mask_data(1, prompt_len); | ||
int64_t position_ids_data = prompt_len - 1; | ||
std::vector<int64_t> attention_mask_data(prompt_len - 1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, @TolyaTalamanov !!
6cdd518
to
cc34616
Compare
@@ -10,7 +10,7 @@ int main(int argc, char* argv[]) try { | |||
std::string prompt; | |||
std::string models_path = argv[1]; | |||
|
|||
std::string device = "CPU"; // GPU, NPU can be used as well | |||
std::string device = "NPU"; // GPU, NPU can be used as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the default device should remain CPU
@@ -472,7 +480,7 @@ std::optional<NPUDesc> extract_npu_descriptor(ov::Core& core) { | |||
ov::AnyMap get_baseline_common_config() { | |||
ov::AnyMap config = { | |||
{ "NPU_COMPILATION_MODE_PARAMS", "compute-layers-with-higher-precision=Sqrt,Power,ReduceMean,Add_RMSNorm" }, | |||
{ "NPUW_DEVICES", "NPU" }, | |||
{ "NPUW_DEVICES", "NPU,CPU" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be changed I believe
} | ||
|
||
ov::genai::TokenizedInputs tokenized_input; | ||
//if (m_is_chat_conversation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it commented out?
DecodedResults decoded_results = {m_tokenizer.decode(encoded_results.tokens), encoded_results.scores}; | ||
auto decode_stop_time = std::chrono::steady_clock::now(); | ||
|
||
//if (m_is_chat_conversation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it commented out?
} | ||
|
||
template <typename T> | ||
void print_tensor(ov::Tensor t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed
const ov::genai::Tokenizer& tokenizer, | ||
const std::string& device, | ||
const ov::AnyMap& config) { | ||
//return std::make_unique<StaticLLMPipeline>(models_path, tokenizer, device, config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
Related PRs: