SqueezeAILab / SqueezeLLM Public

Notifications
Fork 43
Star 661

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: SqueezeAILab/SqueezeLLM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

16 Open 12 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Minor bug for --include_sparse

#39 opened Aug 11, 2023 by vuiseng9 updated Aug 11, 2023

Vicuna-1.5?

#44 opened Oct 29, 2023 by mlinmg updated Oct 29, 2023

Future plan for this project

#45 opened Nov 1, 2023 by tjtanaa updated Nov 1, 2023

A question about LLaMA-2-7B and Mistral models only provide Dense-only (0%) quantized models

#56 opened Feb 4, 2024 by WeiMa01 updated Feb 4, 2024

sample_weight is negative when running kmeans clustering

#61 opened Feb 23, 2024 by MingLin-home updated Feb 23, 2024

D+S packing in vLLM seems buggy

#62 opened Feb 27, 2024 by MingLin-home updated Feb 27, 2024

On A100 card, speed-up effect does not show up.

#51 opened Nov 30, 2023 by leocnj updated Feb 29, 2024

Dense-only quantization bit precision

#63 opened Mar 5, 2024 by akarkim updated Mar 5, 2024

Support JAIS models

#65 opened Mar 24, 2024 by 7ossam81 updated Mar 24, 2024

Installation instructions did not lead to the local transformers version being selected, giving errors

#66 opened Apr 9, 2024 by RDouglasSharp updated Apr 9, 2024

Further speeding up the quantization process

#67 opened May 5, 2024 by SyphonArch updated May 10, 2024

Why do LLaMA-2-7B have s0 quantized models, but no s5 and s45 sparsity quantized models?

#68 opened May 26, 2024 by Evane5cence updated May 26, 2024

how can I get the models of 0.45% sparsity by myself?

#69 opened Jun 17, 2024 by LiMa-cas updated Jun 17, 2024

Can you update the version that can quant OPT family?

#70 opened Jun 26, 2024 by Chen-1031 updated Jun 26, 2024

When I use SqueezeLLM to quantize the LLaMA2-13B model and test it, the speed is extremely slow.

#71 opened Jul 3, 2024 by zhangfzR updated Jul 3, 2024

Request for Script to Reproduce Experiments in Figure 2

#75 opened Dec 2, 2024 by weizhepei updated Dec 2, 2024

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly