Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] What's the groupsize of w4a16 + w8a16 #112

Open
xiguadong opened this issue Oct 31, 2024 · 1 comment
Open

[Question] What's the groupsize of w4a16 + w8a16 #112

xiguadong opened this issue Oct 31, 2024 · 1 comment
Labels
question Please ask any questions on Slack. This issue will be closed once responded to.

Comments

@xiguadong
Copy link

xiguadong commented Oct 31, 2024

Hello , in the https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4/blob/c34a4a91629f09f73a285f32dbd26106b033c654/config.json#L29 has mentioned the groupsize is 128 for 4bit or 8bit. So could you tell me the groupsize for this model?

And If I want to deploy the official 4bit model to QNN, how shuold I do?

thanks

@mestrona-3 mestrona-3 added the question Please ask any questions on Slack. This issue will be closed once responded to. label Nov 6, 2024
@shreyajn
Copy link

The Qwen on AI Hub Models is Qwen 2.0. The block group size is 64.

If using our provided model, you can deploy it using the tutorial: https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Please ask any questions on Slack. This issue will be closed once responded to.
Projects
None yet
Development

No branches or pull requests

3 participants