Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device #62

taeyeonlee · 2024-07-03T06:02:33Z

Hi,
Could you share the sample Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device ?

your sample "python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export"
generated the files below.
Llama2_PromptProcessor_1_Quantized.onnx
Llama2_PromptProcessor_1_Quantized.data
Llama2_PromptProcessor_1_Quantized.encodings
and job_jogk97en5_optimized_bin_m6qek5zyq.bin which is downloaded from AI Hub.

how to run these files on my Android Device ?
Anyone can help ?

AndreaChiChengdu · 2024-07-16T09:52:35Z

hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful

swb1234554321 · 2024-07-23T09:48:38Z

hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful

@AndreaChiChengdu I've tried to follow the tutorial but it turns out to need the “GenAI” feature in QNN.
However, there is no guide on how to get the permission....so lots of genAI related executable & libs missing and the tutorial is just failed.

Do you have any clue on this?

bhushan23 · 2024-07-23T21:57:04Z

@swb1234554321 @taeyeonlee we are aware of this and are actively working on this with other groups within Qualcomm.
We will be able to release sample app soon once "GenAI" dependencies are released in QNN SDK.

We will update on this issue once we can release sample app.

MenghuaZheng · 2024-07-25T06:49:41Z

hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful

Hi，Can you shared this tutorial? I dont find it.

Best regards.

dirtdust · 2024-08-02T05:00:55Z

@swb1234554321 @taeyeonlee we are aware of this and are actively working on this with other groups within Qualcomm. We will be able to release sample app soon once "GenAI" dependencies are released in QNN SDK.

We will update on this issue once we can release sample app.

@bhushan23 Thanks for your great work, when can you release the sample app? we are all looking forward to it, especially how to run those downloaded files using QNN

yolanda1224git · 2024-08-04T09:00:33Z

hi , i have a question.
In ai-hub, llama2-7B model is divided into 4 parts, and each part can run inference job seperetedly. We will get 4 bins as a result, which are like job_jogk97en5_optimized_bin_m6qek5zyq.bin.

Can I run this 4 parts as a whole model using QNN?

yolanda1224git · 2024-08-04T09:15:09Z

hi friend, qcom released a tutorial for deploy llama2 on 8gen3 in ai stack, it maybe helpful

@AndreaChiChengdu I've tried to follow the tutorial but it turns out to need the “GenAI” feature in QNN. However, there is no guide on how to get the permission....so lots of genAI related executable & libs missing and the tutorial is just failed.

Do you have any clue on this?

hi， can you share the tutorial link, thank u so much

bhushan23 · 2024-09-11T05:25:27Z

please refer to https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama to run llama2 models on device with Genie.

We will keep this issue open until Android / compute sample app with C++ APIs are released

taeyeonlee · 2024-10-21T07:55:01Z

hi, @bhushan23
Could you please share the plan to release the Android sample app with C++ APIs ?

shubhamgupto · 2024-11-05T19:53:30Z

please refer to https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama to run llama2 models on device with Genie.

We will keep this issue open until Android / compute sample app with C++ APIs are released

any update on this end?

mestrona-3 · 2024-11-06T01:20:47Z

Hi All, we are actively working on this and will share via Slack once the sample app is released. Thanks!

shubhamgupto · 2024-11-06T13:55:57Z

Hi All, we are actively working on this and will share via Slack once the sample app is released. Thanks!

Thats amazing, could I join the slack channel? Thank you

mestrona-3 · 2024-11-07T00:05:31Z

Yes! Please feel free to join, here is the invite link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device #62

Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device #62

taeyeonlee commented Jul 3, 2024

AndreaChiChengdu commented Jul 16, 2024

swb1234554321 commented Jul 23, 2024 •

edited

Loading

bhushan23 commented Jul 23, 2024 •

edited

Loading

MenghuaZheng commented Jul 25, 2024

dirtdust commented Aug 2, 2024

yolanda1224git commented Aug 4, 2024

yolanda1224git commented Aug 4, 2024

bhushan23 commented Sep 11, 2024

taeyeonlee commented Oct 21, 2024

shubhamgupto commented Nov 5, 2024

mestrona-3 commented Nov 6, 2024

shubhamgupto commented Nov 6, 2024

mestrona-3 commented Nov 7, 2024

Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device #62

Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device #62

Comments

taeyeonlee commented Jul 3, 2024

AndreaChiChengdu commented Jul 16, 2024

swb1234554321 commented Jul 23, 2024 • edited Loading

bhushan23 commented Jul 23, 2024 • edited Loading

MenghuaZheng commented Jul 25, 2024

dirtdust commented Aug 2, 2024

yolanda1224git commented Aug 4, 2024

yolanda1224git commented Aug 4, 2024

bhushan23 commented Sep 11, 2024

taeyeonlee commented Oct 21, 2024

shubhamgupto commented Nov 5, 2024

mestrona-3 commented Nov 6, 2024

shubhamgupto commented Nov 6, 2024

mestrona-3 commented Nov 7, 2024

swb1234554321 commented Jul 23, 2024 •

edited

Loading

bhushan23 commented Jul 23, 2024 •

edited

Loading