New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Question] What's the groupsize of w4a16 + w8a16 #112

Open

xiguadong opened this issue Oct 31, 2024 · 1 comment

Labels

question

xiguadong commented Oct 31, 2024 •

edited

Loading

Hello , in the https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4/blob/c34a4a91629f09f73a285f32dbd26106b033c654/config.json#L29 has mentioned the groupsize is 128 for 4bit or 8bit. So could you tell me the groupsize for this model?

And If I want to deploy the official 4bit model to QNN, how shuold I do?

thanks

mestrona-3 added the question label

shreyajn commented Nov 14, 2024

The Qwen on AI Hub Models is Qwen 2.0. The block group size is 64.

If using our provided model, you can deploy it using the tutorial: https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment